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ABSTRACT 


This thesis explores the “Infomax” method of Independent Component Analysis 
(ICA) to accomplish blind source separation (BSS). The Infomax method separates 
unknown source signals from a number of signal mixtures by maximizing the entropy of 
a transformed set of signal mixtures and is accomplished by performing gradient ascent 
in MATLAB. This work specifically focuses on small numbers of two types of signals: 
audio signals and simple communications signals (polar non-return to zero signals). The 
Infomax method is found to be successful and efficient only for small numbers of signals, 
and improvements to the gradient ascent algorithm should be made for the Infomax 
algorithm to succeed for more than three signal mixtures. MATLAB implementation 


code is included as appendices. 
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EXECUTIVE SUMMARY 


As the use of wireless communications expands, more signals are introduced into 
the environment. The pervasiveness of these signals results in overcrowding in the 
spectrum and an increasing number of overlapping signals. Multiple signals overlapping 
in time and frequency create co-channel interference. When superimposed signals are 
received, they are generally difficult to demodulate due to the influence of the interfering 


signal on the decision statistics in the receiver, resulting in inaccurate demodulation. 


In military applications, the ability to correctly demodulate received signals 
affects friendly communications capabilities as well as hostile threat assessments. Co- 
channel signals are often received as signal mixtures, although the nature of the source 
signals and the mixing process is usually unknown. The problem of finding original 


signals from a mixture of signals is called blind source separation. 


Blind source separation (BSS) is a term used to describe the method of extrication 
of underlying source signals from a set of observed signal mixtures with little or no 
information as to the nature of those source signals. Blind source separation has a variety 
of applications, including neural imaging, economic analysis, and signal processing. 
Independent Component Analysis (ICA) is a method of finding unknown source signals 


from signal mixtures, and it is just one of many solutions to the BSS problem. 


Just as ICA is one of many methods of resolving signal mixtures into their 
original component source signals, many approaches have been developed to perform 
ICA. This research focuses on the “Infomax” algorithm, which finds a number of 
independent source signals from the same number of signal mixtures by maximizing the 


entropy of the signals. 


Entropy is basically the average information obtained when the value of a random 
variable is found, and Infomax is based on the fact that the maximum entropy of joint 


continuous random variables occurs only when the random variables are statistically 


XVil 


independent. Therefore, if entropy is maximized, the resulting signals must be 
independent. If the contributing signals are independent, then these independent signals 


must be the original source signals. 


The Infomax algorithm achieves the maximum entropy of a function using 
gradient ascent, an iterative process of taking a “step” in the direction of maximum 
gradient until a local maximum is reached. If this process is repeated sufficiently, the 
global maximum will eventually be found. When the global maximum of entropy is 
found using gradient ascent, entropy has been maximized, and the resulting signals are 


the source signals. 


In this research, the Infomax algorithm was implemented in MATLAB, and the 
Infomax theory was first tested on audio signals. For small numbers of signal mixtures 
(two to three), the Infomax algorithm was found to be rather efficient. As the number of 
signal mixtures increased, however, the complexity of the algorithm increased and 
efficiency decreased. The algorithm was then adapted to a simple communications 
signal, the polar non-return to zero (NRZ) waveform. Two methods of modifying the 
Infomax algorithm to a different signal type were tested, and the superior method was 
chosen to run simulations. Again, the Infomax algorithm proved to be efficient in 
extracting small number of signals (two to three) from the same number of signals 
mixtures. As the number of signals increased, the complexity of the algorithm increased, 


resulting in longer and less accurate computations. 


The Infomax method of ICA was found to be quite successful in solving the blind 
source separation problem for small numbers of sources (both audio signals and polar 
NRZ signals). This research could be extended to larger numbers of signals, as well as 
signals of many different types. Improvements to the gradient ascent algorithm could 
improve the Infomax algorithm’s performance in both of these cases. Additionally, the 
many other methods of ICA could be tested and compared for consistency, efficiency, 
and accuracy. The applicability of this topic to a variety of research fields creates 
abundant opportunities for future work: a very exciting future for breakthroughs in ICA 


as a method of Blind Source Separation. 
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I. INTRODUCTION 


As the use of wireless communications expands, more signals are introduced into 
the environment. The pervasiveness of these signals results in overcrowding in the 
spectrum and an increasing number of overlapping signals. Multiple signals overlapping 
in time and frequency often create co-channel interference [8]. When these 
superimposed signals are received, they are generally difficult to demodulate due to the 
influence of the interfering signal on the decision statistics in the receiver [10], resulting 


in inaccurate demodulation. 


In military applications, the ability to correctly demodulate received signals 
affects friendly communications capabilities as well as hostile threat assessments. Co- 
channel signals are often received as signal mixtures, although the nature of the source 
signals and the mixing process is usually unknown. The problem of finding original 


signals from a mixture of signals is called blind source separation. 


A. BLIND SOURCE SEPARATION 


Blind source separation (BSS) is a term used to describe the method of extrication 
of underlying source signals from a set of observed signal mixtures with little or no 
information as to the nature of those source signals. Blind source separation has a variety 
of applications, including neural imaging, economic analysis, and signal processing. A 


classic example of blind source separation is the “cocktail party problem [12], [6].” 


1. The Cocktail Party Problem 


The cocktail party problem considers the example of a room full of people 
speaking simultaneously. Microphones (equal to the number of people in the room) are 
scattered throughout the room, and each microphone records a mixture of all the voices in 
the room. The problem, then, is to separate the voices of the individual speakers using 
only the recorded mixtures of their voices. A simplified version of the cocktail party is 


illustrated in Figure 1. 


Source 1 
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Ww 
vw Q Source 3 





Source 3 





A 


Source 4 


Source 4 


Figure 1 The “Cocktail Party” problem. 


While this figure is simplified in that there are only four “attendees” at the “party”, it is 
also evident how complicated the problem becomes as the number of source signals and 
signal mixtures increases. Independent Component Analysis (ICA) is one of many 


methods of addressing the problem of blind source separation. 


B. OBJECTIVES AND OUTLINE 


This thesis is intended to provide a framework for future research into the topic of 
Independent Component Analysis at the Naval Postgraduate School. Research objectives 


are as follows. 


1. Research ICA Methods 


A variety of ICA methods were reviewed, and a single method was chosen for 
further analysis. Chapter II contains a history of ICA and a literature review. The 
Infomax method was chosen based on the more comprehensive introductory material 
available (mainly in the form of Stone’s book [12]), as well as the similarity in its 


resultant equation to other methods. 


2. Analyze Infomax Method 


Chapter III presents a detailed analysis of the Infomax method of ICA. The main 
concepts of Infomax are defined, and the equations necessary for the implementation of 


the Infomax algorithm are derived. 


3. Implement Infomax Algorithm in MATLAB 


Basic MATLAB code for the Infomax method is provided in Appendix D of [12], 
and this code was modified and adapted for convenience and expansion of applications. 
Chapter IV describes the process of tailoring and improving code for the basic algorithm, 


as well as describing the results of the algorithm. 


4. Draw Infomax Conclusions and Outline Future Research 


The capabilities of the Infomax method as it relates to signal processing are 
addressed in Chapter V, and the algorithm was found to be very successful in extracting 
small numbers of signals. Potential future research topics, including adapting the 
algorithm for larger numbers of signals and testing on different types of signals, are 


addressed as well. 
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I. BACKGROUND 


Independent component analysis attempts to extract M signals y from M signal 


mixtures x by optimizing an unmixing matrix W, where [12] 


y = Wx (1) 
In vector-matrix notation 
1 2 3 N I 7) 3 7 
yy y yi ste: Ag = Ye Wi Wi “Wig x x; x; cs eee, 
1 2 3 N 1 4 3 7 
yo y5 y; Bes Gee = W> W5 ss Woy xX, xX, x; aS, ER (2) 
! Z 2 N 1 2 3 N 
Ym Yu Yu “° °° Yu Wu1 War. "° Wyu IL*%*u %m Xm Ot Xu 


where the subscripts of x and y indicate the signal number and the superscripts are the 


time index. 


The principle of ICA was first developed in the early 1980’s by Herault, Jutten, 
and Ans, neurophysiologists studying muscle contraction. They observed two responses 


x,(t) and x,(t), from which they obtained position and velocity signals y,(t) and y,(t) 


[6]. Utilizing a technique of non-linear decorrelation, they showed that independent 
components could be found as “nonlinearly uncorrelated linear combinations” [6]. While 
this is the first known adaptation of ICA, it focused only on two signals and is less 
efficient than the more modern approaches [6]. Other early work on ICA included the 
work of Cichocki and Unbehauen, who developed the algorithm to solve Equation (1) 


and applied it to neural networks [4]. 


In the early 1990’s, Principal Component Analysis (PCA) and Projection Pursuit, 
both similar methods to ICA, were applied to the blind source separation problem 
[6][12]. Principal Component Analysis seeks sources that are Gaussian and uncorrelated, 
rather than the non-Gaussian, independent sources of ICA [12]. As uncorrelatedness is 
not as strong a property as independence, PCA is not as robust as ICA but is less 
computationally challenging. Additionally, PCA can be utilized as a precursor to the 
ICA algorithm [6]. Projection Pursuit seeks one independent component at a time from a 


set of mixtures by maximizing kurtosis to find the most non-Gaussian signal [12]. This 
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differs from ICA in that projection pursuit extracts one signal from a mixture of M 


signals at a time, while ICA extracts all M signals at once [12]. 


In 1995, A.J. Bell and T.J. Seyjnowski developed an “‘information-maximization 
approach to blind separation and blind deconvolution,” which is now referred to as 
Infomax [2]. Infomax is a method of finding mutually independent signals by 
maximizing information-flow, or entropy. Infomax yields the same result as another ICA 
method, Maximum Likelihood Estimation (MLE) [6], [12]. MLE optimizes parameter 
values to find the best fit of observed data given some model (MLE optimizes W to find 


the best fit of the extracted signals y to the source signals s) [6], [12]. Both Infomax 


and MLE make several assumptions about the source signals, the validity of which is 


explored in Chapter III. 


In Infomax and MLE, as well as PCA and projection pursuit, optimization is 
achieved by gradient ascent. Gradient ascent is an optimization method that maximizes a 
function of multiple parameters by iteratively improving an initial guess using the 
gradient, which points in the direction of maximum slope [12]. A disadvantage of this 
method is that if the step size, or learning rate, is not chosen carefully, the function does 
not converge properly. Additionally, the Gradient Ascent method only finds a local 
maximum, so if the initial starting value is closer to a local maximum than the global 
maximum, the algorithm does not converge to find the maximum function value. Either 


of these situations can produce erroneous results [6]. 


Slightly more advanced methods of ICA include complexity pursuit and FastICA. 
Complexity Pursuit describes a method of ICA that extracts signals with the least 
complexity, as a mixture of signals will be more complex than any of its source signals 
[12]. FastICA describes a fixed point algorithm that can be applied (in lieu of gradient 


ascent) to perform more efficient calculations [6]. 


Relatively recent extensions of the ICA model include applications for noisy 
environments, cases in which there are fewer mixtures than independent components, and 
circumstances where convolution is incorporated in the creation of the mixtures. 


Considerations have also been given to nonlinear mixing processes and situations where 
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the components are Gaussian and have time dependencies. Of particular interest is the 
application of ICA to telecommunications. Uses of ICA and BSS in code-division 
multiple access (CDMA) have been explored. This thesis focuses mainly on the Infomax 
and Gradient Ascent methods, and the other methods and extensions of ICA were not 


pursued [6]. 


This chapter introduced the background of ICA as a method of BSS. In Chapter 


IL, the Infomax algorithm and its implementation are analyzed. 


THIS PAGE INTENTIONALLY LEFT BLANK 


Il. ANALYSIS 


In this chapter, one independent component analysis algorithm and _ its 
implementation are presented. The mathematics in this chapter is mostly adapted from 
James V. Stone’s book, “Independent Component Analysis: A Tutorial Introduction” 


[12]. Note that in this chapter, as in [12], the notation y represents a vector function of 


time while y’ represents a sample of that vector function at specific time rf. 


A. INFOMAX STRATEGY 


Infomax is a method of ICA grounded in information theory which aims to find 
independent source signals by maximizing entropy. The details, calculations, and 
assumptions involved with the Infomax method are discussed later in this chapter, but the 
general strategy of Infomax begins with Equation (1), where the extracted signals y are 
obtained from signal mixtures x by optimizing an unmixing matrix W. Infomax holds 
that the extracted signals are source signals if they are mutually independent. While 
independence of the signals cannot be measured, entropy can. Entropy is related to 
independence in that maximum entropy implies independent signals. Therefore, the 
objective of ICA is to find the unmixing matrix W that maximizes the entropy in the 


extracted signals y. 


Entropy of the signal mixtures x is constant, but the change in entropy can be 


maximized by mapping the signals y=Wx to an alternate set of signals 
Y = g(y) = g(Wx). This mapping spreads out Y so that the change in entropy from 
x — Y can be maximized by optimizing the unmixing matrix W, and when entropy is 
maximized, the resulting signals are independent. The inverse y = g '(Y) is then taken, 
resulting in extracted signals y that are also independent. Since the extracted set of 
signals y are independent, they must be the original source signals s. The Infomax 


strategy is depicted graphically in Figure 2. 








Optimize W 


Signal for Extracted 
mixtures nee Max Entropy =}—— pe a Signals 
2 Y=g(Wx) of y=g (Y) 
y 
Y 
Figure 2 Infomax strategy. 


The aforementioned strategy is based on four properties of bounded signals which 
are discussed in detail in Section C.1 and involve the topics mutual independence, 
invertible functions, inverse functions, and entropy. Mutual independence means that the 


outcome of one event has no effect on the outcome of another. A function y= f(x) is 
invertible if every value of x corresponds to only one value of y. Entropy is discussed 


in the following section. 


B. ENTROPY 


In 1948, Claude Shannon introduced the concept of information entropy as a 
measure of uncertainty associated with a random variable [9]. Stone describes entropy as 
a “measure of uniformity of distribution such that complete uniformity equals maximum 


entropy.” Other descriptions include average surprise, or average information [13]. 


1. Information 


The information associated with the occurrence of event A is defined as 





I(A) =n - —In(Pr[A]). (3) 


Pr[ A] 
Note that the base of the logarithm is arbitrary, but the natural logarithm is used in this 


thesis for mathematical convenience. Therefore, the units of information are nats. If the 
probability of event A occurring is high (Pr[A]~1), then it contains very little 
information: 

(A) =—In(Pr[A]) » -In(1) ~0. (4) 
Conversely, if the probability of an event is very low (Pr[A] = 0) , then it contains infinite 


information: 
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(A) =—In(Pr[A]) » —In(0) = 2 (5) 
Entropy is average information, which can be obtained from the expectation. An 


expectation is essentially a weighted average and is defined in [13] as 
E{X}= 2X X(s) Pris]. (6) 
The entropy H, or expected information, is then 
H(A) = E{I(A)}= DIA, ]Pr[ 4] (7) 


where 7 represents an arbitrary number of events. For this arbitrary number of events, 


entropy H(A) is obtained by substituting Equation (3) into Equation (7), resulting in 
H(A) = E{-In(Pr[A])} = >) —In(Pr[ A], )Pr[ |] (8) 
which can be rearranged to the form 
H(A) =-)'Pr[4]in(Pr[4]). (9) 
Equation (9) is the formal definition of entropy for a set of events [13]. 


2. A Two-Event Example 


In circumstances where there are only two possible outcomes (n=2) for an event 


(Yes/No, Heads/Tails, 0/1, etc.), the probabilities of the two events sum to one, and 


Pr[ A,]+Pr[ A, ]=1. (10) 
Define 
Pr[AJ=p and Pr[A,]=1-p. (11) 
For the two event example, Equation (9) can be expressed as 
H(A) =—(Pr[A ]In(Pr[ 4, ]) + Pr[ A, ]In (Pr[ A, ])) (12) 
and substitution of Equation (11) into Equation (12) yields 
H(p)=—pI\n(p)—(— p) Ind — p). (13) 


Entropy H(p), where p takes on the values of probability ranging from zero to 
one (0 < p <1), is shown in Figure 3. It is evident from the plot that the maximum value 


of entropy is obtained when p=0.5, which, for example, could be a fair coin where a 


11 


head is as likely to result as a tail. The probability mass function of a fair coin resembles 
a uniform probability density function (pdf), illustrating that signals with a uniform pdf 


have maximum entropy. 








0 0.1 2 03 O04 05 66 O07 08 69 1 
Probability 











Figure 3 Entropy of a two-event example with equal probability. 


Entropy can also be expressed as a continuous random variable in analogy with 
Equation (9) as the limit n—>oo. The entropy of a continuous random variable A is 


defined as 


H(A) =-[" p,(@)Inp,(@da = E{-In(p,(A))}. (14) 


All expectations can be approximated by averaging a reasonably large number of 


trials. Applying this to Equation (14), we get 
1 N 
H(A)=—— Diln p,(A'), (15) 


where ¢ is a time sample and N is the number of time samples. This is the definition of 


entropy that is utilized in Infomax. 
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3. Entropy of a Univariate Probability Density Function 


Since the Infomax method obtains mutually independent signals by maximizing 
entropy, and the entropy of the signal mixtures x is constant, an expression for entropy 
of a transformed signal Y is necessary so that the change in entropy can be maximized. 
A simplified expression of entropy can be obtained by considering the univariate case, 
where the signal contains only one dependent variable. In this case, x is a random vector 


and each element of x is a different signal sampled at the same time fr. 


From Equation (15), entropy of a signal Y is 
1 N 
H(Y)=-— Dinp,(¥'), (16) 


where Y = g(y), and y is scalar function of time ( y=y(t)= y') , and the superscript ¢ 
indicates the scalar value of y at time ¢. The function g(y) is the cumulative 
distribution function (cdf) of the desired signal y and is often referred to as the “model 
cdf” of the source signals as it is chosen to extract a desired type of source signal. This is 
described in the following section. 

The univariate case is explored by modifying Equation (1) to y=w’x, where 


w’ is a single row of the unmixing matrix W, and x is a vector representing a snapshot 


of M signals in time, as shown in vector matrix notation. That is, for 





y=w'x (17) 
where 
x, 
2 % and (18) 
Xy 
then 
x, 
yew x=[W, Wy Way] i (19) 
Xy 
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Vector multiplication yields 


£ t t t 
Y FS WyX, + Wy Xy Fee + Woy Xy » (20) 


where y’ is a scalar value of one signal sampled at time t. The transformation of y’ 


through the model cdf g(y') yields the mapped value of Y, where Y is a random 


variable on the range from zero to one; 1.e., 


Y = g(y') = 8 (Wyx] + WX) t+ Woy Xt) (21) 
From Equation (16), p,(Y') is the pdf of the mapped signal Y = g(y) and is 


related to the pdf of the extracted signal y, p, (y’), as shown in Figure 4. 





























Figure 4 Transformation of y to Y, From Ref. [12]. 


me) (C) can be used to approximate 


Figure 4 illustrates how a signal y = ( cers 


its pdf (D). The transformation of its pdf through its cdf (B) yields a uniform distribution 
(A). From Figure 4, 


Py (YAY = p,(y")Ay . (22) 
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By rearranging Equation (22), we get 





a 1, Ay _ Py(y’) 
py(Y =p, eae AY . (23) 
Ay 

Since 

eae 8 as Ay—>0, (24) 

Ay dy 
Equation (23) becomes 

i Py’) 
Py (X") dv (25) 
dy 


The magnitude of the denominator of Equation (25) is taken into account for 


monotonically increasing and decreasing functions, resulting in 





(26) 








Y 
Since Y = g(y) where g(y) is the model cdf of the source signal, then o = g'(y), and 
y 


g'(y) is the pdf of the source signal p,(y). Substituting this result into Equation (26), 
we obtain 


Py") 
p,(") 


Substituting Equation (27) into Equation (16), we get a univariate expression for entropy 


DAY) = 





(27) 


in terms of the pdfs of the source and extracted signals: 





H(Y)= yh sie (28) 


; Dy? 


To solve Equation (28), an expression for the pdf of the extracted signal p,(y) 1s 


necessary, but as the discussion of the univariate case is meant as an introduction to the 


multivariate case, this is addressed in the next section. 
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4. Entropy of a Multivariate pdf 


The univariate model can be extended to a general case in which there is more 
than one random variable. From Equation (14), Entropy H is equal to 
H(A) = E{-In(p,(a))}. (29) 
This can be represented in vector notation for multiple variables, where 
A={A,A,,---,A,} and a={a,,q,,---,a,}. The resulting multivariate expression for 
entropy 1s 
H(A) =E{-In(p,(a))} (30) 
where p,(a) is the multivariate pdf of random vector A. If each a, is independent and 


identically distributed, then 
M 
Pg (@) = Py(4)Py(4))-* Py(Gy) =] [pa (@,)- (31) 
i=1 


The natural log of the multivariate pdf is 
M 
In(p,(a)) =In TT psta)) = In( p44) P4(a,)-** P4(ay))- (32) 
i=l 
From the properties of logarithms, the log of a product is equal to the sum of the logs, so 
In(p4(4,)P.4(4y)-+* P4(4y)) =In(p,(@,))+In(p,(a,)) +--+ In(py(ay,))- 33) 


Equation (33) can be rewritten as 


dln (p,(a,)). (34) 


Substituting Equation (34) into Equation (30), we get the resulting expression for entropy 


as 


H(A)= BS p.ta))| (35) 


The expectation can be estimated by taking an average, which yields an expression 


similar to the univariate result in Equation (15): 
N 1 N 


H(A)=-—YY'In(p,(a))=-— Yn 74) (36) 


i=l t=1 t=1 
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If Equation (36) is applied to the mapped signal Y transformed from the model 
cdf Y = g(y) = g( Wx), the multivariate expression of entropy of signals Y becomes 
1 N 
H(Y) = -Lin(py(¥')), (37) 


t=1 


which is the multivariate form of the univariate expression in Equation (16). 
As in the univariate case, an expression is needed for the joint pdf p,(Y') and is 


obtained by adapting Equation (26) for the multivariate case, resulting in 


P,(y) 
oy 
oy 





Py(Y)= (38) 





The denominator of Equation (38) is the Jacobian, which is examined in more 
detail in the following section. Following the logic in the univariate case, since 
Y = g(y), where g(y) is the model cdf of the source signals, then OY/Oy is the pdf 
g'(y) of the source signals, which can also be expressed as p,(y). Equation (38) can be 
rewritten as 


P,(y) 
Ply) 





py(Y)= (39) 


Substitution of Equation (39) into Equation (37) results in a multivariate expression for 


entropy in terms of both the source signal pdf p,(y) and the extracted signal pdf p,(y): 





te Py 
H(Y)= if eee 40 
o tEm{ 2 oo 


As in the univariate expression of entropy in Equation (28), this multivariate 


expression of entropy requires an expression for the pdf of the extracted signal p,(y). 
To obtain this expression, the relationship in Equation (38), which is true for any 
invertible function, is taken into account. The pdf p,(y) of the extracted signal y = Wx 
can be expressed as 


Pp, (x) 
ay . 
Ox 





P,(y) = (41) 
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As in Equation (38), the denominator of Equation (41) is the Jacobian. 


a. The Jacobian 


The Jacobian J is a scalar value which is the determinant of an MxM 


Jacobian matrix J of partial derivatives [1]. If M =2, then 


oy OY 
0 0 
joa ee (42) 
Ox | OY)  OYo 
Ox, Ox, 
Since the Jacobian J is the determinant of the Jacobian matrix J [1], 
Oy OY, 
0 0 
J=lI\= jamie — V1 Vo _ Yo (43) 
Oy, Oy,| Ox, OX, OX, OX, 
Ox, Ox, 
b. The Jacobian and the Unmixing Matrix 


An example with M = 2 is used to illustrate the relationship between the 


Jacobian matrix J and the unmixing matrix W. 


From Equation (1), y = Wx, where 


w, Ww x 
AD Way M95 Xy 
Substituting these values into Equation (1), we get 
=|" ley (45) 
Vo Wo) Wo x4 
Evaluation of this equation shows that 
Jy = WX TWX, (46) 


and 


Vo = WyX + Wy Xp - (47) 
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From Equation (42), the Jacobian matrix requires expressions for dy,/Ox,, Oy,/Ox,, 
Oy, /Ox, , and Oy, /Ox,. The first two partial derivatives are obtained from Equation (46), 


while the second two are obtained from Equation (47): 


. =W, (48) 
a =Wp (49) 
oa =W, (50) 
X 

= = Wy (51) 


Substitution of Equations (48) - (51) into the Jacobian matrix in Equation (42) yields the 


Jacobian matrix 


WwW, = W 
io 1 J (52) 
Wo, Wo 
Equation (52) is identical to the expression for W used in Equation (44). Therefore, 
WwW, «OW 
s-| 1 °|-w. (53) 
Wy, Wo 


The determinant of J is then equal to the determinant of W, so from Equation (43) 

J =|J|=|W| (54) 
The multivariate expression of entropy in Equation (40) requires an expression for 
p,(y), where P,(y) = p,(x)/|éy /Ox| as shown in Equation (41). As |ay /Ox| is the 


Jacobian, from Equation (54) 








| =J=|WI, (55) 
Ox 
and the pdf of the extracted signal can be rewritten as 
AX) 
p,(y) =P (56) 


i 
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This result leads to the expression of entropy used in the Infomax algorithm. 


5. INFOMAX Expression for Entropy 


Substituting the expression for the pdf of the extracted signals found in Equation 


(56) into the expression for entropy in Equation (40), we get 


1< p(x’) 
A(Y)=-— > In| —*-—— _ }. 57 
ou tS | poe ou 


From the properties of logarithms, this equation can be rewritten as 
1 ~ t t 
H(Y)=-——) (In p(x )-In|W|-In p,(y')). (58) 
t=] 
Distribution of the summation results in the form 


ie ie 
H(Y)=—— Din p(x’) += Dn p,(y') + In|. (59) 


t=1 
When the first part of Equation (59) is compared to the expression of entropy in Equation 


(37), it is recognized as the entropy of X: 
1 N 
H(X)=-—}'In p,(x') (60) 
N t=1 
Equation (59) can now be rewritten as 
1 N 
H(Y) = H(X)+—— Dn p(y’) +1n|W). (61) 
t=] 


Since the unmixing matrix Wthat maximizes the entropy H(Y) does not affect the 
entropy H(X), H(X) can be ignored, meaning that the unmixing matrix that maximizes 


Equation (61) also maximizes 
1 N 
h(Y) = pon p,(y') + In| W| (62) 
t=1 


where fis the time index. Equation (62) can be modified further by ignoring the ordering 


of the signals M , resulting in 


h(Y) => yin p,(y;)+In|W]. (63) 


i=l t=l 
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The W that maximizes Equation (63) maximizes the entropy in Y , implying the 
rows of Y are independent. Since y is the inverse of Y, this implies the rows of y are 
independent, which implies that W is the unmixing matrix that yields the original 
signals. Equation (63) is the fundamental equation used in the Infomax algorithm and is 


based on several assumptions. 


C. PROPERTIES AND ASSUMPTIONS 


The process shown in Section B is based on the following properties of bounded 


signals and assumptions. 


1. Properties 
a. Bounded Signals with a Uniform pdf Have Maximum Joint 
Entropy 


This is addressed in detail in Section B.2. Equation (28) at the end of 
section B.3 demonstrates this property as well. As the goal of Infomax is to optimize W 
so that the extracted signals match the source signals, an important characteristic to note 


is that W is chosen so that p,(y') = p,(y'); ie., the ratio of the pdfs is equal to one. 
From Equation (27), 


peje (64) 
Py’) 


Therefore, if the pdf of the mapped signal Y is equal to one, then the mapped signal is 





uniformly distributed on [0,1], which indicates a maximum entropy function for random 


variables bounded by [0,1]. 


b. Signals with Maximum Joint Entropy are Mutually Independent 


This is proven in Appendix B. Since entropy is additive for independent 
random events, and entropy of dependent random events is less than that of independent 
random events, maximizing entropy must yield mutually independent signals [3]. 
Sections B.3 through B.5 addressed obtaining an expression of entropy, and Section D 


describes the method to maximize entropy. 
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c. Any Invertible Function of Mutually Independent Signals Yields 
Mutually Independent Signals 


This allows W to be optimized so that g(Wx) yields independent 
signals. If g(Wx) is a function of mutually independent signals, then Y = g(Wx) also 


yields mutually independent signals. 


d. If a Function is Invertible, its Inverse is Invertible 

This property allows the extracted signals y to be obtained from the 
mutually independent signals Y= g¢(y)=g(Wx) by taking the inverse y=¢ '(Y). 
Since W is optimized so that each row of Y is independent, its inverse y is also 


independent. 


Zz: Assumptions 


The Infomax method makes the following assumptions. 


a. All Time Samples of Each Signal Are Independent 


This assumption is not realistic, but it allows M independent signals to be 
estimated over N time steps. Since communication signals violate this assumption, it is 
recognized that the Infomax method may not have universal applicability to 
communications signals, especially those signals sampled at rates much higher than the 


Nyquist rate. 


b. All Source Signals Can Be Approximated by the Same pdf 


This assumption is also unrealistic, but it is convenient and leads to useful 
results. It has been shown by [12] that the Infomax algorithm is somewhat forgiving of 


violations of this assumption. 


Cc. The Model pdf is an Exact Match for the pdf of the Source 
Signals 


The third assumption is highly unlikely, but others [2] have had success 


with Infomax despite the violation of this assumption. This is because the exact source 
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signals are unknown, so if the model pdf is an approximation of the source pdf, then the 


extracted signals are the best possible approximation of the sources signals. 


While none of the assumptions above are truly valid, all are acceptable as 
some assumptions must to be made about the source signals in order to establish a 


starting point for blind source separation. 


D. GRADIENT ASCENT 


Equation (63) gives the entropy of the transformed signals Y to within a 
constant. Now that an expression for entropy has been derived, the objective of Infomax 
is to find an unmixing matrix W that maximizes the entropy of Y, or equivalently 


maximizes h(Y) where Y= g(y)=g(Wx). Gradient ascent is the method used to 


optimize the unmixing matrix W. Essentially, gradient ascent is an iterative process of 
taking a “step” in the direction of maximum gradient until a local maximum is reached. 


Gradient ascent requires an expression for the gradient of entropy. 


1. Gradient of Entropy 
Equation (63) is rewritten for this purpose by taking the expectation over time 
rather than assuming that all time steps are independent, which results in 
M 
h(Y) = {Sn p.0v| +In|W]. (65) 
i=l 
The gradient is found by taking the partial derivative of h with respect to W, 
oh/OW , and for the purpose of simplification, the gradient is first found with respect to 
one element of W, oh/ OW,, , and is then expanded to every element. Hence, 
h “olngy, Oln|W 
0 _E »y ng'(y;) 2 | | (66) 
= ow, | OW, 
Simplification of this partial derivative takes place in two parts, treating first the first 


term and then the second term. 
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a. The First Term of Equation (66) 


To simplify the expectation in Equation (66), the partial derivative is 








examined: 
OW, 
Let 
u=g'(y,). (68) 
Equation (67) can now be expressed 
Olnu (69) 
OW, 
Using the Chain Rule [11], we get 
Olnu _1 (70) 
Ow, ou 
Equation (68) gives an expression for u , and the derivative of u is 
u'= Ou _ o8 Oi) (71) 
OW, OW, 
Replacing the expression in Equation (70) with Equations (68) and (71), we get 
Olnu 1s Og OD (72) 





ow, g'(y,) OW, 
The expression 0g '(y,) / OW,, in Equation (72) can be simplified further using the Chain 


Rule in Leibniz notation [11], 








OR _ OB OY (73) 
ow dy ow 
From Equation (73), 
OW, Oy; OW, , 
Equation (74) further simplifies to 
Og '(y;) oy; 
= " ; i . 75 
ow. Bw es 


y i 
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The expression Oy, : OW,, is one element from a mixture x. Generalizing Equations (46) 


and (47) with this expression, we get 











Oy, 
_=X,. 76 
aw, (76) 
Substituting Equation (76) into Equation (75), we obtain 
og '(y;) 
= g"(y,)x,. 77 
aw, g"(y)x; (77) 
Now substituting Equation (77) into Equation (72), we get 
Olnu 1 
=e (ey: (78) 
Ow, gy) : 


Equation (78) can now replace the expression Oln g'(y,)/OW, in the expectation in 


Equation (66), resulting in 


M " ol W 
0, 7/52 OD, La (79) 
OW, i & ‘y) ’ OlnW, 
Now g"(y,) / g'(y,) 1s further simplified for convenience. We define 
Y(y) = £2 (80) 
8 (y,) 
and Equation (79) can be expressed as 
oh tu éln|W| 
—=E Y(y,)x, -+———_. 81 
aw, 2 (y) | dlnW, (81) 


Equation (81) represents the partial derivative of entropy in Equation (66) with the first 


term fully simplified. 


b. The Second Term 


To simplify the second term OIn || / OlnW,, an example is used to show 


that 


éln|W| a 
= 2 
ain, base (82) 


where [wr] is one element of the inverse of the transposed unmixing matrix: 
ij 
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-l 


wis [ w" (83) 


When M =2, the unmixing matrix W is 


w-|" ia (84) 
Wy Wo. 
The transpose W’ is 

Win Wy 


and the determinant of the unmixing matrix |W] is equal to the determinant of the 


transposed unmixing matrix w"| : 


|W = Wi Wo. — Wa Wy = \w"| . (86) 
When i= j=1, 
éIn|W| = O In(w,,W) — W3,W>) zs W5, — Wop (87) 
OlnW,, Wi Wi Wo. — Wo Wio |W] 


The inverse transpose ww, using Equation (85) in Equation (83), is 


wr = [w" ik | Wy | = Wo. a Wes (88) 


Wi Wo. —WoWig L Wa, Wy WW. — Wo Wo || 





The methods in Equation (87) and Equation (88) yield the same result, as is the case for 
all values of i and j. This example illustrates Equation (82), which is accepted without 


proof as is done in [12]. 


c. The Gradient of Entropy 


Equation (82) can be substituted into the expression for the gradient of 
entropy in Equation (81), resulting in 
A -£{S¥opx,f [Ww], (89) 
i=l 
When this expression is expanded to all elements in the unmixing matrix W, it yields a 
complete expression for the gradient of entropy V/, where the gradient of a scalar with 


respect to a matrix is defined as 
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[ ah = Oh oh 
aW, OW, Wy 
dh oh ah 
Vh=| @W,, OW,  OW,,, |. (90) 
Oh Oh Oh 
| OW, OW, OWyay | 


The gradient of entropy V/ for all elements of the unmixing matrix W is then 
Vh=W"+E{¥(y)x"}. (91) 

By assuming that signals are ergodic [7], the expectation can again be mitigated, resulting 

in 

rg 


—-T 1 “ t t 
Vh=W + vax | (92) 


=I 
oH Gradient Ascent Algorithm for Infomax 
The optimal unmixing matrix W is found by maximizing entropy; that is, 
iteratively following the gradient Vf until a local maximum is reached. This is 
accomplished with the following algorithm 
Wiew = Wo +7VA (93) 
where 7 is a small constant. Inserting the expression for Vh in Equation (92) into 
Equation (93), we get the expression to optimize the unmixing matrix W to maximize 


entropy: 
—T 1 ‘ t t i 
Wiew = Wai aa 1) Wai aa pe )[x ] (94) 
t=1 


Equation (94) is the general form of the Infomax algorithm using gradient ascent to 
optimize the unmixing matrix W. It is important to note, however, that the gradient 
ascent algorithm only finds a local maximum, which is not necessarily the global 
maximum of the function. This can be mitigated by running multiple trials of the 


gradient ascent algorithm, initiated from different starting points. 


Zi 


The expression ‘¥(y’) is shown in Equation (80) to equal g"(y)/g'(y). It should 
be recalled that g(y) is the model cdf of the source signals, so ‘¥(y’) depends upon the 
source signals that the algorithm aims to extract. Since W is optimized by maximizing 
entropy of the transformed signals Y=g(y) and maximum entropy signals Y are 
mutually independent, the signals y=g'(Y) are also mutually independent. Since 
Infomax extracts a set of signals y which are mutually independent and the only 
mutually independent signals possible are the original source signals, the results of the 


optimized algorithm in Equation (94) are the source signals s. 


In this chapter, the Infomax algorithm was derived and its implementation using 
gradient ascent was analyzed. In Chapter IV, the expressions for entropy and gradient are 
applied to specific signal types, and the results from simulations of the Infomax method 


using MATLAB are given. 
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IV. MODELING AND SIMULATION 


Chapter III was dedicated to obtaining expressions for entropy and its gradient. 


M N 
The entropy ACY), as shown in Equation (63), is h(Y) = —yyhn P.O) +In|W] . The 


i=l t=1 


N 
gradient Vh was shown in Equation (92) to be Vha=W" +> vex] , where 


t=1 
Vy’) is g"(y)/g'(y). The change in entropy is maximized using the gradient ascent 


algorithm in Equation (93), W.,,., = Wog +7VA. 


It is clear from the equations above that both the entropy and its gradient are 
dependent upon a model pdf of the source signals, p,(y) or g'(y), and it should be 
recalled that the extracted signals are transformed through their model cdf Y= g(y). 


Stone refers to this as the cdf/pdf-matching property of Infomax, and it is through the 
careful selection of a model cdf and pdf that the Infomax method is tuned towards the 
desired type of extracted signal. If one desires extracted speech (or audio) signals, as in 
the cocktail party example of Chapter I, a model cdf and pdf that resembles audio signals 
should be used in the Infomax algorithm [12]. 


A. HIGH-KURTOSIS SIGNALS 


Kurtosis K is a measure of peakedness of a pdf and is defined as 


- E{x'} (95) 


where E x‘ is the fourth central moment and E a is the second central moment. 


Gaussian signals have zero kurtosis. When kurtosis is negative, a signal is sub-Gaussian. 
A super-Gaussian signal has positive kurtosis and is referred to as a high-kurtosis signal 


[12]. 


Figure 5 depicts a sample audio signal of the first few bars of the “Hallelujah 
Chorus” from Handel’s Messiah, from MATLAB’s “Datafun” directory. 
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Figure 5 Sample audio signal. 


As audio signals spend the majority of their time around zero, they typically have super- 
Gaussian pdfs. Therefore, Infomax uses a super-Gaussian pdf to model high-kurtosis 


signals such as audio signals [12]. 


iF Adapting the Infomax Algorithm 


A typical cdf used to model high-kurtosis signals is the hyperbolic tangent, 
plotted in Figure 6. While this is not a valid cdf because it contains negative values, it 
was used successfully in [12], and was also used in these simulations. Simulations were 
also run with a shifted and scaled version of the hyperbolic tangent as a model cdf, and a 
shifted and scaled version of its derivative as the model pdf. These were more accurate 


representations of the cdf and because they were valid models: the cdf was positive and 


ranged from [0,1], and the pdf had an area of 1. The results of the scaled and shifted 


hyperbolic tangent were consistent with the results using the true hyperbolic tangent 
function. The original version of the hyperbolic tangent was used for ease of comparing 


results with the work done in [12]. 
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Hyperbolic Tangent 





g(¥) = tanh(y) 




















Figure 6 Hyperbolic tangent function. 


Hence, 
Y = g(y) = tanh(y) (96) 


Since the cdf g(y) was chosen to be the hyperbolic tangent in Equation (96), the 
derivative of this is g'(y), or the pdf, which is illustrated in Figure 7 [11]. This pdf is not 
ideal because its area is 2 , but was also used successfully in [12]. Hence, 


g(y)= <tanhiy) = sech*(y) =1-tanh’(y) (97) 
ry 





Hyperbolic Tangent Derivative 


= 1-tanh?(y) 


9° 














Figure 7 Hyperbolic tangent derivative. 
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Adapting Equation (63) for a high-kurtosis signal, the expression for entropy becomes 
1 M N 
h(Y)=—)" )'In(1- tanh? y!)+In|W]. (98) 
i=l t=l 
The gradient of entropy V/A contains the ratio Y(y), where Y(y) = g"(y)/g'(y). 
While the pdf g'(y) was given in Equation (97), the derivative of the pdf, g"(y) is also 


necessary and is 
d d 2 d 2 d 
“(y) =— g (y) =—(1-tanh = ——(tanh = —2 tanh(y) — tanh 99 
sw=78W) a (y)) 7 (y)) (9) tanh(y) (09) 


Since d/dy(tanh(y)) is equal to g'(y) , Equation (99) becomes 


~2 tanh(y)g ‘(y), (100) 
and the resulting expression for the derivative of the model pdf is 
g"(y) =—2tanh(y)g (y). (101) 


The ratio Y(y) in the gradient Vh is 


: —2 tanh A 
wy) = ty) 7 (8 (y) _ 
gy) gy) 
Adapting Equation (92) for a high-kurtosis signal, we get the following gradient Vh 





2 tanh(y) . (102) 


Vh=W" +> Panh(y')[x"] : (103) 


t=l 
Expressions for high-kurtosis implementation of Infomax have been obtained, so 


now a computing tool is necessary to perform the algorithm effectively and efficiently. 


2. Implementation in MATLAB 


Appendix D in [12] shows sample code for the implementation of the Infomax 
algorithm for high-kurtosis signals using gradient ascent. The code was adapted for 
easier entry of variables, which allowed inputs to be changed and simulations to be run 
and re-run with greater ease. With Stone’s default values for step size 7 (77 =0.25) and 
maximum number of iterations (100), the algorithm converged approximately half the 
time. It was clear that an improved gradient ascent algorithm was necessary for more 
consistent performance, and some additional complexity in the code was accepted to 


improve the reliability of the results. 
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a. Improving the Gradient Ascent Algorithm for Faster 
Convergence 


The goal of improving the gradient ascent algorithm W.,.,, = W,,, +7VA 


was not to find the optimal algorithm but instead to increase the rate of convergence as 
well as the accuracy of the algorithm results. The unmixing matrix W is initially set to 
the identity matrix. However, rather than taking the same size step in the direction of 
maximum increase, larger steps are taken as long as entropy h(Y) continues to increase. 
If entropy decreases, it is assumed that the algorithm missed the maximum. Rather than 
continuing to take steps, the algorithm regresses to the last “good” values of W and 
h(Y) and begins taking smaller steps, which gradually increase again as long as entropy 


increases. A flow chart of the improved gradient ascent algorithm is shown in Figure 8. 


b. Improving Maximization by Varying Initial W 


Gradient ascent finds the nearest local maximum of a function, which is 
not necessarily the maximum function value. Since Infomax finds extracted signals by 
maximizing entropy, it is necessary for the global maximum to be found for the algorithm 
to properly converge. A higher probability of locating a global maximum was obtained 
by creating a function of the improved gradient ascent algorithm so that gradient ascent 
could be repeated multiple times with varying starting points. The gradient ascent 
function encompasses the process shown in Figure 8. The Infomax MATLAB code was 
adapted so that the gradient ascent function could be repeated multiple times. The first 
repetition uses the identity matrix for the initial value of W, and subsequent repetitions 
randomly select an unmixing matrix W, which is then multiplied by some small step 
constant and the repetition number. The adapted MATLAB code for high-kurtosis 
signals is included in Appendix D, and the gradient ascent function is included in the 


Appendix E. 
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Figure 8 Improved gradient ascent algorithm flow chart. 
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3. Results and Conclusions 


The modified MATLAB code was run for M signals with M =2, 3, and 6. 
Adding to the complexity of the code increased the run time but also resulted in more 
reliable convergence (approximately 80% of simulations with M =2 converged in 35 


iterations or fewer). 


a. Results for M =2 Source Signals 


High-kurtosis signals were imported from MATLAB’s “data fun” 
directory and are shown as Signal | (a bird chirp) and Signal 2 (a gong) in the first row of 
Figure 9. The random mixtures of the two source signals are shown in the second row of 


Figure 9. The Infomax algorithm was run with the values shown in Table 1. 


Table 1 Algorithm variable values for M =2. 

















Infomax algorithm iterations 100 
Initial step size 7 0.1 
Step size increase factor @ 12 
Step size decrease factor £ 0.1 
Gradient Ascent repetitions 5 














The values of these variables were not analyzed to be optimal, but produced consistent 
results. The signals extracted by the Infomax algorithm are shown in row three of Figure 
9. The similarity of the extracted signals to the original source signals is evident. As 
Infomax does not preserve the ordering of the signals, the results would sometimes be 
reversed (as shown), with the gong extracted as “Extracted Signal 1” and the chirp 
extracted as “Extracted Signal 2.” Additionally, Infomax does not necessarily preserve 
the sign of a signal with a pdf that is even about zero, although this is not evident in 


Figure 9. 
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Figure 9 Infomax results for M =2. 


Figure 10 shows gradient ascent function values for entropy /A(Y), 


gradient of entropy VhA, and step size 7. In Figure 10, results converged after 


approximately 15 iterations, although, generally, the results converged after 35 to 50 
36 


iterations. The middle plot depicts |Vh , the magnitude of the entropy gradient. As 





entropy is maximized, the magnitude of the gradient decreases, indicating the maximum 


is very near the current value. The bottom plot shows how the step size 7 changes with 


the “learning” gradient ascent algorithm. 
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Figure 10 Entropy ACY), gradient Vh, and 7 values for M =2. 
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b. Results for M =3 Source Signals 


The addition of a third signal from the MATLAB “Data fun” directory 
(the “splat” sound of spilling paint) and third signal mixture to the Infomax algorithm 
slightly increases the complexity and computation time. The values used in the algorithm 


are shown in Table 2. 


Table 2. Algorithm variable values for M =3. 














Infomax algorithm iterations 100 
Initial step size 77 0.1 
Step size increase factor @ 1.2 
Step size decrease factor # 0.1 
Gradient Ascent repetitions ) 














With the same number of iterations and the same number of gradient 
ascent repetitions performed, the results were fairly consistent with those obtained in part 
a, although they converged less often (approximately 50% of runs converged). The 
results converged more frequently when the number of gradient ascent repetitions 
increased, but that also increased the computation time. The results of a sample run of a 
“chirp,” a “gong,” and a “splat” are shown in Figure 11. As in the two-signal case of part 
a, the extracted signals are clearly close matches for the source signals. Also, although 


Figure 11 suggests otherwise, in general signal ordering is not preserved [12]. 
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c. Results for M =6 Source Signals 


The Infomax method was tested on six source signals and mixtures from 


MATLAB’s “Data fun” directory, with the types of each source signal shown in Table 3. 


Table 3 Signal descriptions for M =6 source signals. 





Signal 1 Signal 2 Signal 3 Signal 4 Signal 5 Signal 6 








chirp gong splat laughter train whistle music 




















The values for the Infomax iterations and gradient ascent variables are 


shown in Table 4. 


Table 4 Algorithm variable values for M =6. 

















Infomax algorithm iterations 100 
Initial step size 77 0.1 
Step size increase factor @ LZ 
Step size decrease factor £ 0.1 
Gradient Ascent repetitions 5, 10, 15, 20, 25, 50, 100 














In the six source signal case, multiple runs with increasing numbers of 
gradient ascent iterations were attempted in order to find a reasonable number of 
iterations where the function converged fairly consistently. For five iterations of the 
gradient ascent function, it was clear that similarities between the source signals and 
extracted signals existed, and the entropy value was maximized as shown in the first plot 
of Figure 12. However, zero of ten trials with five gradient ascent repetitions truly 
converged, as is evident from a sample trial shown in Figure 13. In fact, even 100 
repetitions of the gradient ascent algorithms did not produce consistently convergent 


results, although entropy was still maximized, as evident from the top plot in Figure 14. 
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The source and extracted signals for 100 gradient ascent repetitions are shown in Figure 
15. Although the results are slightly better than the five repetition case, there was a 


significant increase in computation time for a relatively small improvement in results. 





Function Values - Entropy 





40 50 60 70 80 90 100 
Iteration 


Magnitude of Entropy Gradient 


0 10 20 30 


Gradient Magnitude 
ar 


n 





o 
= 
o 
N 
oO 
Ww 
o 
& 
oO 
n 
o 
[2] 
o 
~~ 
o 
oo 
o 
© 
o 
= 
oS 
o 











Iteration 
Magnitude of Eta 
0.35 
0.3 
@ 0.25 
=) 
= 
= 02 
= i 
= 0.15 
= 
uw 0.1 
0.05 
0 
0 10 20 30 40 50 60 70 80 90 100 
Iteration 
Figure 12 h(Y), Vh, and 7 values for M =6 for 5 gradient ascent repetitions. 
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Source and extracted signals for five gradient ascent repetitions. 
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Figure 14 h(Y), Vh, and 7 values for M =6 for 100 gradient ascent repetitions. 
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Source and Extracted Signals for 100 gradient ascent repetitions. 
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d. Conclusions 


For high-kurtosis signals, the Infomax algorithm with multiple iterations 
of a “learning” gradient ascent function proved quite useful in extracting small numbers 
of signals from signal mixtures. As the number of source signals increased, the algorithm 
was less able to reliably extract source signals quickly and consistently. This was likely 
because the random mixtures of an increased number of source signals s resulted in 
signal mixtures x with “bumpier” entropy functions; i.e. entropy h(Y)=h(Wx) with 
many local maximums. The increased occurrence of local maximums required more 
iterations of the gradient ascent function with different initial values for the unmixing 
matrix W to find a starting point that might find the global maximum rather than a local 
maximum, since the Infomax algorithm only converges and extracts source signals by 


finding the global maximum of entropy h(Y). 


B. SIMPLE COMMUNICATIONS SIGNALS 


Once it was determined that the Infomax algorithm was fairly reliable for small 
numbers of high-kurtosis signals, it was tested on a simple communications signal, the 


polar non-return to zero signal [5]. 


1. Adapting the Infomax Algorithm for a Polar NRZ Signal 


The implementation of a polar NRZ signal into the Infomax algorithm first 
required a function that would create a polar NRZ signal in MATLAB. Code for this 
function is included in Appendix F. The polar NRZ function was used to generate two 
signals with randomly selected amplitudes, bit rates, and time shifts, and these signals 
were randomly mixed. The first attempt at extracting polar NRZ signals used the high- 
kurtosis model cdf g(y) =tanh(y) and model pdf g'(y) =1—tanh’(y). As expected, the 
algorithm was unsuccessful at extracting the two polar NRZ signals after multiple 


attempts, and it was necessary to model the cdf and pdf of a polar NRZ signal. 


A sample polar NRZ signal is shown in Figure 16. The cdf is shown in Figure 17, 


and its associated derivative pdf is shown in Figure 18. 
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Figure 16 Sample polar NRZ signal. 
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Figure 17 Theoretical polar NRZ cdf. 
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Figure 18 Theoretical polar NRZ pdf. 


The equation representing the cdf g(y) of a polar NRZ signal as shown in Figure 17 is 


g(y) =0.5u(y +1) +0.5u(y-1), (104) 
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where u(y) is the unit-step function. The corresponding derivative g'(y) is 

g'(y) =0.56(y +1) +0.56(y -1) (105) 
The theoretical cdf and pdf for the polar NRZ signal create a dilemma in that they cannot 
be implemented in MATLAB due to the delta function in the pdf. Therefore, it was 
necessary to create approximations of the cdf and pdf that could be implemented in 


MATLAB. 


2. Creating Model Statistical Functions 

As Infomax aims to extract source signals based on a model cdf and pdf, it was 
necessary to create functions for the cdf g(Y), the pdf g'(Y), and the derivative of the 
pdf g"(Y) that could be implemented in the Infomax code using MATLAB. This was 


accomplished by two separate approaches. The first of these approaches modeled the 
delta functions in the pdf as triangles with arbitrarily narrow bases. The second approach 
used a modified version of the hyperbolic tangent function to more closely model the cdf 


and pdf of a polar NRZ signal. 


a. The Triangular Model 


The first approach involved approximating the delta function in the pdf of 
Figure 18 as a triangle with a base of 27, where y was chosen as an arbitrarily small 


number, as shown in Figure 19. 


Approximate Polar NRZ Probability Density Function (pdf) 
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Figure 19 Approximate polar NRZ pdf. 
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The approximate pdf and cdf are derived in Appendix C. The resultant expression for the 


approximate pdf q'(y) for a polar NRZ signal is 














0 y<-(+y) 
1 l+y 
+ l+y)<y<-l 
ay? ay? (l+y)<y 
—l y-l 
+ l<y<y-l 
ay?” 2y? y Y 
q(y) = 40 y-l<y<l-y (106) 
1 y-l 
— yt l-y<y<l 
ay?” 2y? Y y 
—] l+y 
— yt l<y<l+ 
2?” 2y7 y Y 
0 y>lt+y 


The function was implemented in MATLAB (code for “polarNRZpdf’ in Appendix H). 


The approximate cdf q(y) was found by integrating the pdf qg'(y), and is 


0 y<-(+y) 
Sm(y*-+7)*)+o(y-d=7)) “d+/)<y<-l 
0.25+| 2m(1-y")+0(1+ »)] -l<y<y-l 

q(y) =49.5 y-lsy<l-y (107) 


a(3)=05+] 2m(y? 0-19") +0(y-0-) l-y<y<l 


1 
u(y) =0.154| 2m(1-y")+P(y-1) lsy<l+y 
1.0 y>lt+y 
The function was implemented in MATLAB (code for “polarNRZcdf”’ in Appendix G). 
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The derivative of the pdf g"(y) was simple to construct by taking the 


slope values from the appropriate regions of the pdf, resulting in 


0 Veatey) 
ae ~(l+y)<y<-l 
—1 
wy -I<y<y-l 
q'(y) = 40 y-lSyai-y (108) 
ze l-y<y<l 
5 l<y<lt+y 
0 y>l+y 


Code for the “polarNRZdpdf” function is included in Appendix I. 


The MATLAB implementations of the approximate model cdf qg(y), pdf 
q'(y), and pdf derivative g"(y) are plotted for y=0.1 in Figure 20. The approximate 
functions g(y) and g'(y) bear a striking resemblance to the theoretical functions g(y) 
and g'(y) shown in Figure 17 with the added benefit that they can be implemented in 


MATLAB so that the Infomax algorithm can be adapted for the polar NRZ signal. 
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Figure 20 Approximate polar NRZ cdf q(y), pdf g'(y), and g"(y) for y=0.1. 
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b. The Hyperbolic Tangent Model 

Since the long range of zeroes in the triangular model caused divide by 
zero problems in the function (Equation (80)) when implemented, and the hyperbolic 
tangent was used successfully in approximating the cdf and pdf of the high-kurtosis 
signals in part A, modifications were made to adapt the theoretical polar NRZ cdf from 


Figure 17 and pdf from Figure 18. 


A second approximate polar NRZ cdf @(y) is created by duplicating and 


shifting the hyperbolic tangent function so as to more closely approximate the theoretical 


cdf. A good approximation of the cdf is found to be 


Ztanh (o(y-+1)) +1 y<0 
Ay) = (109) 
qian (o(y—D)+3 y>0 


where o is a compression factor that controls the slope of the function. 
The approximate pdf @'(y) is found by taking the derivative of the cdf 


O@(y), and the result is 


5 (1 tanh” (o(y+))) y<0 
O(y)= (110) 
(1 tanh? (o(y-D)) y20 


An expression for @"(y) is found by taking the derivative of the pdf 


Oy): 
0%) = Loy) = £{ S(1-tanh? (ovy+0))}=0--2[ Fan? (oy b)) (111) 
dy dy\.4 dy\.4 
Using the chain rule, we get 
oy) =-( 72 tan (ory) | £(1-tanh* (a(yD)) Jo (112) 


From Equation (110), @'(y) = mal — tanh’ (o(y+1))), so Equation (112) simplifies to 
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= 
(a) "(y) _ 





tanh (o(y+1))0(y) (113) 


The resulting expression for the approximate pdf derivative @"(y) is 


© tanh(o(y+))O(y) y<0 
a"y)= (114) 


2 


i tanh(o(y-1))O(y) y20 








The approximate polar NRZ cdf @(y) and pdf 6'(y) using the hyperbolic 
tangent function are shown in Figure 21. Like the triangular model, they closely 
resemble the theoretical versions in Figure 17 and Figure 18. MATLAB code for these 


functions is included in Appendices J, K, and L. 
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Figure 21 Approximate polar NRZ cdf @(y) and pdf @'(y) with o =10. 
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3. Results and Conclusions 


Numerous simulations were run using the triangle model to approximate polar 
NRZ statistical functions with y=0.1. These simulations were repeated using the 
hyperbolic tangent model with o =10 so as to closely approximate the theoretical cdf 
and pdf of the polar NRZ signal. The long ranges of zero for g'(y) and qg"(y) produced 
with y=0.1, as shown in Figure 20, and the long ranges of near-zero values for 0'(y) 
caused the Infomax algorithm to behave poorly. As ‘¥(y) 1s the second derivate divided 
by the first derivative, and the first derivative contained long ranges of zero, the divide by 
zero produced erroneous results. To eliminate the long ranges of zero values, trials were 
repeated using the triangle model with y=1.0 (Figure 22) and the hyperbolic tangent 


model with o =2 (Figure 23). 


Setting y equal to 1.0 removed the long stretches of zeroes in the center of the 
pdf g'(y) and its derivative q"(y), as shown in Figure 22. This proved advantageous in 


that simulations with M=2 signals were successful over fifty percent of the time. 
However, also evident from Figure 22 is that the resulting cdf qg(y) and pdf q'(y) are 
much poorer approximations of the ideal cdf g(y) and pdf g'(y). Similarly, using the 
compression factor o =2 in the hyperbolic tangent model eliminated the long ranges of 
near-zero values, as is evident from Figure 23. This improved the reliability of the 
Infomax algorithm. Again, the resulting cdf @(y) and pdf O@'(y) are poorer 


approximations of the theoretical cdf and pdf of the polar NRZ signal. 


In both cases, the less ideal approximation of the polar NRZ cdf and pdf produced 
the best results for the Infomax algorithm. However, all results in this section were 
obtained using the hyperbolic tangent model with o =2 because the hyperbolic tangent 
model provided a more continuous pdf which was differentiable at more points than the 
triangular model pdf. Therefore, the hyperbolic tangent model’s approximation of the 
polar NRZ cdf and pdf produced more consistent results which converged faster, and 
with a greater number of signals and signal amplitudes, than the triangular model 


approximation. 
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Figure 22 Approximate polar NRZ cdf g(y), pdf g'(y), and g"(y) for vy =1.0. 
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Approximate Polar NRZ cdf 6(y) 
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Approximate polar NRZ cdf 0(y), pdf @'(y), and @"(y) with o =10. 
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Figure 23 


a. Results for M =2 Source Signals 


The Infomax algorithm was run with two source signals of randomly 
selected amplitude, bit rate, and time shift using the values shown in Table 5, and 


converged in 70% of trials. 


Table 5 Algorithm variable values for M =2. 

















Infomax algorithm iterations 100 

Initial step size 7 0.05 
Step size increase factor @ Le 
Step size decrease factor # 0.1 
Gradient Ascent repetitions 50 














Figure 24 shows the results of a sample trial, where the top row shows the 
randomly generated source signals, the middle row shows the random mixtures of the 
signals, and the bottom row shows the extracted signals. It is clear from Figure 24 that 
Extracted Signal 1 shares the same bit pattern as Source Signal 2, but with a different 
magnitude, and the case is the same with Source Signal 1 and Extracted Signal 2. In fact, 
every single successful run of the Infomax algorithm extracted signals with a magnitude 
of approximately one. This result is due to the tendency of the Infomax algorithm to 
match the pdf of the extracted signal to the model pdf supplied to the algorithm. Since 
the model pdf was developed around a polar NRZ signal with an amplitude of 1.0, the 
extracted signal also had an amplitude of 1.0. In addition to not preserving the 
magnitude of the original signal, repeated trials showed the algorithm did not necessarily 


preserve the ordering or the phase (+/-—) of the extracted signals, as was the case with 


the high-kurtosis signals. 
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Figure 24 Source and extracted signals for M =2 polar NRZ signals. 
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b. Results for M =3 Source Signals 


The Infomax algorithm was repeated for three polar NRZ source signals of 
the same amplitude and a randomly selected bit rate and time shift, as well as three 
source signals with randomly selected amplitude, bit rate, and time shift using the values 


shown in Table 6. 


Table 6 — Algorithm variable values for M =3. 














Infomax algorithm iterations 100 
Initial step size 77 0.01 

Step size increase factor @ ay 

Step size decrease factor # 0.1 
Gradient Ascent repetitions 100 














Trials with M =3 required more iterations of the gradient ascent 
algorithm and converged less frequently than the two-signal case. While the Infomax 
algorithm was successful in extracting two source signals often (70% of trials), it only 
truly converged to extract all three source signals in 5% of trials. The result of a 
converging trial is shown in Figure 25. From this figure, it is clear that Extracted Signal 
1 is Source Signal 2 with amplitude of one. Source Signal 1 is extracted as Extracted 
Signal 2, and Source Signal 3 is Extracted Signal 3. As was the case for M =2 polar 
NRZ source signals, successful trials do not preserve the magnitude of the original source 
signals, although Infomax was equally capable of distinguishing between signals with the 
same amplitude (not shown) and signals with randomly selected amplitude. Additionally, 


the algorithm does not always preserve signal order or phase. 
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Source and extracted signals for M =3 polar NRZ signals. 


c Results for M =6 Source Signals 


The Infomax algorithm was again repeated for six polar NRZ source 
signals of randomly selected amplitude, bit rate, and time shift using the values shown in 


Table 7. 


Table 7 Algorithm variable values for M =6. 











Infomax algorithm iterations 100 

Initial step size 77 0.01 
Step size increase factor @ Le 
Step size decrease factor £ 0.1 
Gradient Ascent repetitions 300 














For the given values, no signals of significance were extracted. Successful 
implementation for a reasonably fast convergence of the algorithm for greater than three 
source signals either requires a more advanced gradient ascent algorithm or a more 
exhaustive search for the optimal unmixing matrix W, resulting in a significantly 


increased simulation time. 


d. Conclusions 


For polar NRZ signals, the Infomax algorithm as implemented in 
Appendix M was consistently successful in extracting two signal patterns from two signal 
mixtures; however, it does not preserve signal magnitude, ordering, or phase (+/-). 
Magnitude is not preserved as the algorithm matched the signal mixtures to the model 
pdf. Signal order is not preserved because the order was merely convention, and there is 
no way for the algorithm to determine that convention from the signal mixtures. Phase, 


or sign (+/—), is not preserved due to the symmetry of the pdf around zero. 
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When the number of source signals grows to M > 2, the algorithm is not 
reliable in extracting M source signals quickly and consistently. Similar to the high 
kurtosis case, this is likely because the random mixtures of an increased number of 
source signals s result in signal mixtures x with a “bumpier’ entropy function 
h(Y) =hCWx) with many local maximums. The increased occurrence of local 
maximums requires more repetitions of the gradient ascent function with different initial 
values for the unmixing matrix W to find a starting point that finds the global maximum 


rather than a local maximum of entropy h(Y). 


This chapter modeled the Infomax algorithm for both high-kurtosis and 
polar NRZ signals and presented the results and conclusions of MATLAB simulations for 
each individual case. Chapter V presents the broad conclusions of the research as well as 


opportunities for future research in this subject area. 
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V. CONCLUSIONS AND FUTURE WORK 


A. CONCLUSIONS 


Independent Component Analysis using the Infomax method of maximizing 
entropy proved to be a feasible solution to the blind source separation problem for small 
numbers of signals. The assumptions made in Chapter III.C.2, although not necessarily 
accurate, were very useful in that they allowed signals to be extracted successfully. The 
gradient ascent algorithm described in Chapter III.D was sufficient to obtain solid results 


for small numbers of signal mixtures (VM <3), although greater numbers of mixtures 


required more calculation time for convergence. This result was due to the ability of the 
gradient ascent algorithm to locate only the nearest local maximum, while the Infomax 


algorithm seeks the location of the global maximum of entropy. 


By choosing a pdf that matched the desired signals to be extracted, the Infomax 
algorithm was effectively tailored to both a typical high-kurtosis audio signal mixture and 
a simple communications polar NRZ signal mixture. In both cases, Infomax was found 
to be a speedy and reliable method of extracting a small number of signals from a small 
number of signal mixtures. Complexity of the calculations increased with increased 
number of source signals for both types of signals. While the added complexity of 
additional signals makes the problem difficult, it is not impossible to adapt the Infomax 
algorithm for larger quantities of signals. Greater numbers of mixtures would require 
additional improvements to the gradient ascent algorithm or more calculation time in 


order to obtain meaningful results. 


Overall, the Infomax method of ICA implemented with multiple iterations of a 
gradient ascent algorithm as implemented in this thesis was quite successful for 
extracting a small numbers of the signals out of the same number of mixtures. There are, 


however, several opportunities for extended research on this topic. 
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B. FUTURE WORK 


While ICA is still a relatively new research field, its popularity is rapidly 
increasing, and opportunities for future work in this subject matter are abundant. 
Research opportunities range from extensions of this research to exploring other ICA 


methods. 


1. Extensions of ‘Independent Component Analysis by Entropy 
Maximization (Infomax)” 


a. Signals with Identical Bit Rates 


While Chapter IV.B considered polar NRZ signals with both identical and 
random amplitudes, only randomly selected bit rates were analyzed. Further work could 
be done to determine the ability of the Infomax algorithm to distinguish between source 


signals with identical bit rates. 


b. Gradient Ascent Optimization 


The Infomax algorithm was found to converge consistently only for small 


numbers (M <3) of source signals and signal mixtures. With M =6 source signals and 


mixtures, some similarities between the source and extracted signals were found for 
audio signals, but little similarity was found between source and extracted polar NRZ 
signals. Further investigation of an optimal gradient ascent algorithm would likely 
improve the efficiency of the Infomax algorithm as well as extend its capability beyond 


just a few signal mixtures. 


2. Additional Signals 


While this research focused on audio signals and the polar NRZ communications 
signal, Infomax could be applied to a variety of other communications signals. 
Additional communication signal types such as signals at intermediate frequency (IF) or 
radio frequency (RF) with different probability density functions should be considered as 
well as wireless local area network (WLAN), cellular (especially CDMA), and 
frequency-hopped signals. 
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3. Other ICA Methods 


Infomax is just one of many methods of ICA. Numerous other methods could be 
developed, tested, and refined to determine the relative applicability and efficiency with 
various signals. These methods include sphering or whitening (separating the signals 
from additive white Gaussian noise) to the more advanced methods described in Chapter 
Il. All of these possibilities for additional research paint an exciting future for 


breakthroughs in ICA as a method of Blind Source Separation. 
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APPENDIX A. LIST OF VARIABLES 


A arbitrary event 
E{X} expected value of X 
g(y) model cdf through which y is transformed 
g'(y) pdf of source signal, also expressed as p,(y) 
g"(y) derivative of pdf g '(y) 
H(A) entropy of event A 
H(Y) entropy of transformed signal Y 
h(Y) relative entropy of transformed signal Y 
Vh _ gradient of entropy 
I(A) information associated with event A 
J Jacobian matrix 
J determinant of Jacobian matrix 
number of source signals 
N number of time elements 
p,(y) pdf of source signal, also expressed as g '(y) 
Py(Y) pdf of the mapped signal Y = g(y) 
P,(y) pdf of extracted signal y 
q(y) triangular model approximation of polar NRZ cdf 
q(y) triangular model approximation of polar NRZ pdf 


q'(y) triangular model approximation of polar NRZ pdf derivative 


67 


7 


Oo 
Y(y) 
Hy) 


0"y) 


time index 

(MxM) unmixing matrix 

single element of i” row and j” column of W 

(M x N) matrix of signal mixtures 

signal i of x at time ¢ 

= g(y), transformed signal matrix 

(M x N) matrix of extracted signals, y = g '(Y) 

signal i of y at time ¢ 

small constant, increased gradient ascent step size 

small constant, decreased gradient ascent step size 

small constant, base of triangular model for polar NRZ pdf approximation 
small constant, initial gradient ascent step size 

small constant, compression factor for polar NRZ cdf approx (tanh model) 
ratio g"(y)/g(y) 

modified tanh model approximation of polar NRZ cdf 


modified tanh model approximation of polar NRZ pdf 


@"(y) modified tanh model approximation of polar NRZ pdf derivative 
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APPENDIX B. MAXIMUM ENTROPY YIELDS INDEPENDENT 
SIGNALS (PROOF) 


The discrete version of this proof is shown in [3] and is derived in two parts. The 
continuous version presented in this Appendix follows [3] closely. The first part of this 


proof shows that entropy is additive for independent random events, while the second 


part utilizes the inequality In x= 1-(1/ x) to prove that entropy of dependent random 


events is always less than entropy of independent events. Since independent events have 
greater entropy than dependent events, the maximization of entropy results in 
independent events. When applied to the Infomax principle, maximizing entropy through 


gradient ascent results in independent signals. 
A. ENTROPY IS ADDITIVE FOR INDEPENDENT RANDOM EVENTS 


The independent random events X=(X,,X,,...,X,) have a joint pdf 


Px (X) = Py (%) Py, (%))++* Py, (%,). The entropy of X, is 
H(X,)=E{-In py (x)}=—] py, (%)In py, (%)dn. (115) 


The joint entropy H (X) = E{-In Ds (x)} iS 
H(X)=-J---[ py, a) Py, )In| py, Px, @,) [ade (116) 
From the properties of logarithms, 
In| py, (%)-~ Py, (x,) | =In py, (x) +++-+In py (x,) (117) 
and entropy H(X) can be rewritten as 


00 00 


H(X)=-f-= | py, Py, ,) [ln Py) 4--#ln py (x,) Jey, (118) 


—00 


This expression for entropy can again be rearranged, yielding 


H (X) a } Py, (%)In py (4 )dx, +-+++ | Px, (%,)1n py (x, )dx, |- (119) 
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From Equation (115), H(X,)= —{ Px, (%)In py (%)dx, , so Equation (119) can again be 


rewritten as 
H(X)=H(X,)+--+H(X,), (120) 
which proves entropy is additive for independent random events. 


B. ENTROPY IS LESS FOR DEPENDENT RANDOM EVENTS THAN 
INDEPENDENT RANDOM EVENTS 


Equation (120) is true for any number of independent random events and can be 


expressed as the summation 


H(X,)+--+H(X,)= 4 (X,). (121) 


k= 


- 


The summation >’ H(X,) can also be expressed as 
k=1 


> A(X.) => |-| ps, in py, oe, |, (122) 
k=l k=1 mien 
or 
H(X,)=—[--[ px (%.---.%,)> In py, (dx, (123) 
k=l 0 8-0 k=l 


Using the properties of logarithms, we can rewrite Equation (123) as 


D(X, )=-fo] px Gy x, In] J py, ede dx, (124) 

k=l wey 1 ee k=l 
The second part of this proof assumes X=(X,,X,,...,X,) are not the 
independent random events of part one but rather dependent random events. As X is no 


longer independent, the pdfs do not multiply as in part one. Instead, entropy H (X) iS 


foe) co 


H (X)= A(X. X,)=— ff Py seem) Py, ) py (5.04%, dy +, (125) 


To prove that entropy is less for dependent random events than independent 


random events, Equation (125) is subtracted from Equation (124) to get 
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= (126) 


eo) ee) 


=-| of Px (poe, )INT ] py, (ede, dr, 
k=1 


—00 —0o 


+] vf Dx (X00 %,) 0° Py (%,) Inpy (4... %, EX, dX (127) 


Factoring out J] pxG@r--0%,), we get 


—00 —00 


foe} ioe) 


=i) = [py (m-a%,)] In see + In( py (%5.--5%,)) [dade (128) 
= = LlenGe 
k=1 


Since In(a)+In(b) =In(ab) , Equation (128) can be rewritten as 


=| = ( PxGiaok) in Basie) | pe sogie, (129) 
ae eS [] px, 
k=1 


The inequality In x >1- (1/ x) can be applied to Equation (129), resulting in 


> Ltt | peas enti) 12 dx, ++: dx . (130) 
Pp 


Distribution of the integral yields 





(oe) (ee) ie} ee) Pepe pe 
> [of Pyle dry, — [oof CRIED dx,---dx,. (131) 


The first part of Equation (131) is the area under the joint pdf, which equals one. The 
second part of Equation (131) is the product of the area under each independent pdf, 


which also equals one. Therefore, 
A(X, )-A(X,....X,)21-1=0 (132) 
k=1 


Rearranging Equation (132), we obtain 
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H(X,,....X,)S ),H(X;), (133) 


k=l 
thus, proving that dependent random events have less entropy than independent random 


events. 
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APPENDIX C. THE TRIANGULAR MODEL APPROXIMATION 
(DERIVATION) 


A. PDF 
The approximate polar NRZ pdf (shown in Figure 19 and reproduced below) was 
found by breaking the pdf into the regions shown in the figure, and the slope and y - 


intercept of each region was found for non-zero regions according to the slope-intercept 


form z=my+b. 


Approximate Polar NRZ Probability Density Function (pdf) 


4 





Figure 26 Approximate polar NRZ pdf (from Figure 19). 


First, the height of each triangle is found according to the equation for the area of a 
triangle 


A=sbh (134) 


where area A is one-half the base b times the height h. As the total area under a pdf is 
equal to one, the area of each triangle is 0.5. The base b=2y, so 


pee ee 3S. 


135 
b 2y 2y vu 
The slope of regions 2 and 5 is the same, and the slope of regions 3 and 6 is the negative 


of regions 2 and 5. Slope m= Az/Ay and was found to be 


YY _ 
ee ee 
Ay 1-d-y) 2y77- 
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(136) 


The slope is positive in regions 2 and 5 and negative in regions 3 and 6. 


The y-intercept b is found according to the slope-intercept equation z=my-+b. 


For regions 2 and 4, 




















1 (+y7) 
b=z-my =0-—, (+7) =———-— 137 
y ay d+y) ay? (137) 
For regions 3 and 5, 
! Yy-) 
b=z-my=0 j= 138 
y ay (y-V) ay (138) 
The resultant expression for the approximate polar NRZ pdf q'(y) 1s 
0 yay) 
1 1+y 
—yt —i+y)sy<-l 
ay? oy? d+y)sy 
—1 y—l 
= l<y<y-l 
ay? ¥ oy? Be 
q(y) = 40 pus yoley. (139) 
1 vy—-l 
— yt l-y<y<l 
ay? y oy yy 
—1 l+y 
— l<sy<l+ 
ay? Ba ay? y y 
0 y>lt+y 
B. CDF 
The approximate cdf g(y) was found by integrating the pdf g'(y). 
For region 2, 
y 1 ; y 
ay)=fa'o= [| (my'+b)dy'=<my"+ by’ (140) 
-(1+y) 2 -(1+7) 
1 
ay) =5m(y? +7") +b(y- d=). (141) 
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For region 3, 


For region 5, 


For region 6, 


y 
q(y) = fq) = 0.25 + | (my'+b)dy'= 025+] Lm 74 by’ 
-1 


BA 
1 
g(y)= Jay) =05+ | (my'+b)dy'=0.5+ my "+ by’ 


1 2 
a(y)=0254| Am(1-y")+5(149)} 


ley 


y 
| 





y 





l-y 


u(y) =054| -m(y°-d-7")+6(y-d-7)] 


y 
q(y) =|q\y) -0.75+ (m+) = 0.5] Ly 
1 


1 2 
qy) = 0.154] m(1-y )+o(y-0)] 





: 


The resultant expression for the approximate polar NRZ cdf q(y) is 


qy)= 


0 


5m(y?-+7)*)+o(y-d=7)) 


0.254] 


0.5 


a(y)=054| 5m(y°--7")+0(9-0-1)] 


1 2 
u(y) =0.754| 2m(1~y )+o(y-1) 


1.0 


1 


—m 
p) 


(I-y?)+o(14y) 
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y<-U+y7) 
—l+y)< y<-l 
-l<sy<y-l 
y-lsy<l-y 
l-y<y<l 
l<y<lt+y 


y>l+y 


(142) 


(143) 


(144) 


(145) 


(146) 


(147) 


(148) 
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APPENDIX D. INFOMAX CODE FOR HIGH-KURTOSIS SIGNALS 
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hyperbolic tangent 


multiple iterations of a modified gradient ascent function 


3 high-kurtosis signals 
cdf/pdf 
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All inputs are entered here % 
for ease of adapting code 
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(make modifications for add'l sigs below) 
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Enter maximum # of iterations for Gradient Ascent 


Enter number of signals 
Enter number of samples 
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N = 10000; 
maxiter = 100; 


2 


% Enter inital step size for Gradient Ascent 
eta = a1; 

% Enter increased step size for Gradient Ascent 
alpha = 1.2; 

% Enter decreased step size for Gradient Ascent 
beta = 0.1; 


% Enter # of times to repeat Gradient Ascent 


gradasiter = 5; 


% MUST DEFINE cdf, pdf, dpdf in GRADIENT ASCENT function % 


ole 


% Set Random Number State 


ole 


state=2; rand('state',state); randn('state',state) 


% Create "random" signals... 


M =; % # of signals 
N = N; & # of samples 
i = 1:N; 


load chirp; sl=y(i); sl=sl/std(s1l); 
load gong; s2=y(i); s2=s2/std(s2); 


load splat; s3=y(i); s3=s3/std(s3); 


% Create "unknown" signal mixture x 


w = rand(M); 

s = [sl,s2,s3]; Scolumn vectors 
X= Ss * w; 

mixture = x; 


% Perform GRADIENT ASCENT repetitions 
for j = l:gradasiter; 
if j == 1 


W = eye(M); Sinitializes lst W to identity matrix 


W=2 * 3 * rand(M); Srandomly selects subsequent W, increases w/ j 


end 


[y,hs,grads,etas,W] = GradientAscent_highkurtosis (N,maxiter, alpha,beta,eta,x,W); 


SCreate temporary arrays to store values from each iteration 
ytemp(j,:,:) = yi 


hstemp(j,:,:) = hs; 
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gradstemp(j,:,:) = grads; 
etastemp(j,:,:) = etas; 
Wtemp(j,:,:) = W; 


end 


% Select iteration with highest ENTROPY 


[maxh, index] = max(max(hstemp')); 


% Select y, hs, grads, etas, W from highest entropy index 


y(:,:) = ytemp(index,:,:); 
hs(:,:) = hstemp(index,:,:); 
grads(:,:) = gradstemp(index,:,: 
etas(:,:) = etastemp(index,:,:); 
W(:,:) = Wtemp(index,:,:); 


yl = y(:,1); y2 = y(:,2); y3 = 


SPlot original signals 


figure(1); 


subplot (3,M,1); plot(i,sl); title('\bfSignal 1"'); 


ylabel ('Voltage') 


subplot (3,M,2); plot(i,s2); title('\bfSignal 2'); 


ylabel ('Voltage') 


subplot (3,M,3); plot(i,s3); title('\bfSignal 3'); 


ylabel ('Voltage') 


SPlot signal mixtures 


subplot (3,M,4); plot (i,mixture(:,1)); 


time'); ylabel('Voltage"') 


subplot (3,M,5); plot (i,mixture(:,2)); 


time'); ylabel('Voltage"') 


subplot (3,M,6); plot (i,mixture(:,3)); 


time'); ylabel('Voltage"') 


SPlot extracted signals 


subplot (3,M,7); plot(i,yl); title('\bfExtracted Signal 1'); 


ylabel ('Voltage') 


% Break out extracted signals (column vectors) 


Sfinds highest entropy index 


xlabel ('discrete time'); 


xlabel ('discrete time'); 


xlabel ('discrete time'); 


title('\bfSignal Mixture 1'); xlabel('discrete 


title('\bfSignal Mixture 2'); xlabel('discrete 


title('\bfSignal Mixture 3'); xlabel('discrete 


xlabel('discrete time'); 


subplot (3,M,8); plot(i,y2); title('\bfExtracted Signal 2'); xlabel('discrete time'); 


ylabel ('Voltage') 


subplot (3,M,9); plot(i,y3); title('\bfExtracted Signal 3'); xlabel('discrete time'); 


ylabel ('Voltage') 


% % Plot PDFs of "Random" Signals (Optional) 
& figure(2); subplot(2,1,1); hist(sl,N/100); title('\bfPDF of Signal 1') 


& figure(2); subplot(2,1,2); hist(s2,N/100); title('\bfPDF of Signal 2') 


SPlot change inh and gradient magnitude during optimization 

figure (3); 

subplot (3,1,1);plot (hs) ; 

title('\bfFunction Values - Entropy');xlabel('Iteration'); ylabel('h(Y)"'); 
subplot (3,1,2);plot (grads) ; 


title('\bfMagnitude of Entropy Gradient');xlabel('Iteration'); ylabel('Gradient 
Magnitude'); 


subplot (3,1,3);plot (etas) ; 





title('\bfMagnitude of Eta');xlabel('Iteration'); ylabel('Eta Magnitude'); 
% % Modified plots of signals and extracted signals 

& figure (4); 

% subplot (1,M,1); plot(i,sl); title('\bfSignal 1'); 

% subplot (1,M,2); plot(i,s2); title('\bfSignal 2'); 


% subplot (1,M,3); plot(i,s3); title('\bfSignal 3'); 


& figure(5); 


% plot(i,x); title('\bfSignal Mixtures') 


& figure (6); 
% subplot (1,M,1); plot(i,yl); title('\bfExtracted Signal 1'); 
% subplot (1,M,2); plot(i,y2); title('\bfExtracted Signal 2"'); 


% subplot (1,M,3); plot(i,y3); title('\bfExtracted Signal 3'); 


% Listen to audio signals 


% [10000] Fs Sample rate of speech. 


listen=1; 


Fs=10000; 
if listen soundsc(yl,Fs); Ssoundsc(y2,Fs); 
end 


w; % Mixing Matrix 
W; & Unmixing Matrix 


I=w*W % should yield Identity matrix 
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APPENDIX E. GRADIENT ASCENT FUNCTION (HIGH- 
KURTOSIS) 
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GRADIENT ASCENT for HIGH KURTOSIS signals 


2 
© 


oe 


"step" 


This function performs a Gradient Ascent algorithm that varies its 


2 
© 


oe 


and dpdf required! 


pdf, 


size as a maximum is approached - input cdf, 


decreased step size % 


beta 
ghkurtosis (N,maxiter, alpha,beta,eta,x,W) 


increased step size, 


alpha = 
hyperbolic tangent 
GradientAscent_hi 


initial step size, 
DPDF INPUTS: 


eta = 
PDF, 


CDF, 
S[y,hs,grads,etas,W] 
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beta, 


alpha, 


maxiter, 


zeros (maxiter,1); 


_highkurtosis(N, 
and W 
etas 


eta, 


GradientAscent_hi 
grad, 


WwW) 
zeros (maxiter,1); 


etas, 
gs 


grads, 


hs, 
l:maxiter 


x, W) 
zeros (maxiter,1); 


Create arrays to store values of h, 


functionl[y, 
eta, 

hs 

for iter 
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INPUT for Gradient Ascent % 
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gly) ] 
=g' (y) ] 
[dpdf=g'' (y) ] 
+2 * tanh(y).%3; 


[Y= 
[cdf'=pdf 
22) 


(1 - tanh(y) 


tanh (y); 
Input PDF derivative 


Input model CDF 
Input model PDF 
dpdf = -2 * tanh(y) 


pdf = 
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End INPUT - more in ‘'else' below! 


oO 
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(for high-kurtosis sigs only); 


+ log(detw) ; 


—2*Y 
81 


psi 


1); 
(h got smaller - entropy decreased) 


Sor: 


S(if entropy increased) 
dL.) 


1) 


(eps + pdf); 
(iter - 


alpha * etas(iter - 


./ 


abs (det (W) ); 
* sum(sum(log(eps + pdf) )) 


(dpdf) 
eta = 
& h < hs 


(1 / N) 


if h > hs(iter - 


Calculate entropy for current iteration 
else 


psi 
detW 
if iter > 1 


2 


h 


W = Wold; 


beta * etas(iter - 1); 


eta 


oO 


2 


pdf as above! 


INPUT for ELSE loop % 
tanh(y); 


Use same cdf, 
Input model CDF 
Input model PDF 


a 
© 
a 
© 
a 
© 
a 
© 


sO 2) 55 


(1-tanh (y) 


pdf = 


Input PDF derivative 


a 
© 


-2*tanh(y)+2*tanh(y) .*3; 


dpdf 


End INPUT 


a 
© 


psi for Wold 


h, 


sRecalculate detW, 


abs (det (W) ); 


detW 


+ log(detw) ; 


(1/N) *sum(sum(log(epstpdf) ) ) 


he = 


./(eps+pdf) ; 


= (dpdf) 


psi 


end 


a 


% iter <= 


else 


W + eta * grad; 


We= 


eta, W 


grad, 


SRecord h, 


Wold = W; 


eta; 


=h; grads(iter) = norm(grad(:)); etas(iter) = 


hs (iter) 


end 


grads, etas 


hs, 


OUTPUTS y, 
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APPENDIX F. POLAR NRZ SIGNAL FUNCTION 


% POLAR NRZ SIGNAL & 
% This function returns a vector representing a polar NRZ signal. % 


% Ss = polarNRZ(totaltime, bitrate, samplerate) % 


function s = polarNRZ(totaltime, bitrate, samplerate) 
& STEP 1 - Generate random bit sequence 


bits = rand(totaltime * ceil(bitrate), 1) < 0.5; 


bits: =-2)* “(brts: = .5)7 Sconverts bits from 0,1 to -1,1 
& STEP 2 - Duplicate bit sequence n times, where n = samplerate / bitrate 


n = ceil(samplerate/bitrate); 


x= [ J; 
for i= 1:n 
x = [x,bits]; % duplicates column vector 
end 
% STEP 3 - Concatenate rows of x 
yoSEL AG 


for i = 1l:length(bits) 


y = ly x(4,:)]3 


end 
actualsamples = totaltime * samplerate; 
s = y(l:actualsamples) ; 


& STEP 4 - change first bit to be random (i.e. not always at the beginning of a bit) 


shift = ceil (rand*samplerate/bitrate) ; 


s = [s(shift + l:actualsamples) s(l:shift)]'; 
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APPENDIX G. POLAR NRZ CDF FUNCTION: TRIANGULAR 
MODEL 


ole 


POLAR NRZ CDF % 


ole 


This function returns an approximate cdf of a polar NRZ signal % 


% using the triangular model with gamma as small base constant % 


6 Z = polarNRZcdf(y, gamma) % 
09999999999990999999909090999990999999999999909999999099999999999999099099 
COTOCTCFC OOOO OC OOOO OOOOODOO OOOO OOOOOOOOODOOOOOOODOOOOOOOOOOODOOOODOOOO0O000000 
function z = polarNRZcdf(y, gamma) 

m= 1/ (2 * gamma*2); % slope (pos or neg) 

bpos = 1 + gamma; % numerator of y-intercept (pos) 

bneg = gamma - 1; % numerator y-intercept (neg) 


bdenom = 2 * gamma*2; % denominator of y-intercept b 


bl = bpos / bdenom; % y-intercept (pos) 

b2 = bneg / bdenom; % y-intercept (neg) 

Z = zeros(size(y)); % creates array to store values for cdf 
& Define "critical points" - where equations change 

critPoints = [-(lt+gamma) -1 (-lt+gamma) (1l-gamma) 1 (1+gamma) ]; 


% Region 1 equation (z < -—1-gamma) 


% z= 0 - do nothing! 


% Region 2 equation (-1l-gamma < z <= -1) 

z2 = (0.5 * m* (y.*2 - (1 + gamma)*2) + bl * (y + 1 + gamma)).* (y >= critPoints (1) 
& y < critPoints(2)); 

% Region 3 equation (-1 < z <= -—1l+gamma) 

z3 = (.25 + (.5 * m* (1 - y.*2) + b2 * (1 + y))).* (y >= critPoints(2) & y < 
critPoints(3)); 

% Region 4 equation (-l+gamma < z <= 1-gamma) 

z4 = .5.* (y >= critPoints(3) & y < critPoints(4)); 


% Region 5 equation (l-gamma < z <= 1) 


z5 = (.5 + (.5 * m* (y.%2 - (1 - gamma)*2) + b2 * (y — (1l-gamma)))).* (y >= 
critPoints(4) & y < critPoints(5)); 


% Region 6 equation (1 < z <= 1+gamma) 


z6 = (.75 + (.5 *m* (1 - y.*2) + bl * (y - 1))).* (y >= critPoints(5) & y < 
critPoints(6)); 


% Region 7 equation (z > 1+ gamma) 
z7 = (y >= critPoints(6)); % always 1 
% Combine values of each region for output cdf 


Z= 22 + 23 + 244+ 25 + 26 + 27; 
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APPENDIX H. POLAR NRZ PDF FUNCTION: TRIANGULAR 
MODEL 


ole 


POLAR NRZ PDF % 


ole 


This function returns an approximate pdf of a polar NRZ signal % 
% using the triangular model with gamma as small base constant % 


6 Z = polarNRZpdf(y, gamma) % 


function z = polarNRZpdf(y, gamma) 


m= 1/ (2 * gamma*2); % slope (pos or neg) 
bpos = 1 + gamma; % numerator of y-intercept (pos) 
bneg = gamma - 1; % numerator y-intercept (neg) 


bdenom = 2 * gamma*2; % denominator of y-intercept b 


bl = bpos / bdenom; % y-intercept (pos) 

b2 = bneg / bdenom; % y-intercept (neg) 

Z = zeros(size(y)); % creates array to store values 

%& Define "critical points" - where equations change 
critPoints = [-(l+gamma) -1 (-l+gamma) (1l-gamma) 1 (1+gamma) ]; 
%& Region 1 (z < -l-gamma) - always 0 

% Region 2 equation (-1l-gamma < z <= -1) 


z2 = (m * y + bl).*( y >= critPoints(1) & y < critPoints(2)); 


%& Region 3 equation (-1 < z <= -1l+gamma) 
z3 = (-m * y + b2).* (y >= critPoints(2) & y < critPoints(3)); 
% Region 4 (-l+gamma < z <= 1-gamma) - always 0 


% Region 5 equation (l-gamma < z <= 1) 
z5 = (m * y + b2).* (y >= critPoints(4) & y < critPoints(5)); 
% Region 6 equation (1 < z <= 1+gamma) 
z6 = (-m * y + bl).* (y >= critPoints(5) & y < critPoints(6)); 


%& Region 7 (z > 1+ gamma) - always 0 


% Combine values of each region for output pdf 


Z= 22 + 23 + z5 + 26; 
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APPENDIX I. POLAR NRZ DPDF FUNCTION: TRIANGULAR 
MODEL 


ole 


POLAR NRZ DPDF % 


ole 


This function returns an approximate dpdf of a polar NRZ signal % 
% using the triangular model with gamma as small base constant % 


6 zZ = polarNRZdpdf(y, gamma) % 


function z = polarNRZdpdf(y, gamma) 


2 


m= 1/ (2 * gamma*2); % slope (pos or neg) 


2 


zeros (size(y)); % create array to store values 


N 
Il 


2 


% Define "critical points" - where equations change 


critPoints = [-(lt+gamma) -1 (-lt+gamma) (1l-gamma) 1 (l1+gamma) ]; 


%& Region 1 (z < -l-gamma) - always 0 

% Region 2 equation (-1l-gamma < z <= -1) 

z2 = (m).* (y >= critPoints(1) & y < critPoints(2)); 
% Region 3 equation (-1 < z <= —1+gamma) 

z3 = (-m).* (y >= critPoints(2) & y < critPoints(3)); 
% Region 4 (-l+gamma < z <= 1-gamma) - always 0 

% Region 5 equation (l-gamma < z <= 1) 

z5 = (m).* (y >= critPoints(4) & y < critPoints(5)); 


% Region 6 equation (1 < z <= 1+gamma) 


z6 = (-m).* (y >= critPoints(5) & y < critPoints(6)); 
%& Region 7 (z > 1+ gamma) - always 0 


% Combine values of each region for output dpdf 


Z= 22 + 23 + z5 + 26; 
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TANH MODEL 


APPENDIX J. POLAR NRZ CDF FUNCTION 
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using the modified hyperbolic tangent model w/ compression factor sigma 
sigma) 


This function returns an approximate cdf of a polar NRZ signal 


Z = polarNRZcdf2(y, 


POLAR NRZ CDF2 
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(y < critPoint); 
(y >= critPoint); 


* 
* 
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ALY w 
aoeee ys 


- where equation changes 
(y + 1)) 
(y -— 1)) 


sigma) 
creates array to store values 


a 
© 


0 


polarNRZcdf2 (y, 

"critical point" 
(tanh(sigma * 
(tanh(sigma * 


<2. % 


zeros (size(y)); 
(25% 


Combine values of each region for output cdf 


Equation for z < 0 
Equation for z > 


Define 


Z= zl + 22; 


critPoint = 0; 


function z 
Zz 
2 
3 
2 
3 
zl 
2 
3 
Z2 
2 
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TANH MODEL 


APPENDIX K. POLAR NRZ PDF FUNCTION 
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using the modified hyperbolic tangent model w/ compression factor sigma 
sigma) 


This function returns an approximate pdf of a polar NRZ signal 


Z = polarNRZpdf2(y, 


POLAR NRZ PDF2 
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(y < critPoint); 
(y >= critPoint); 


* 


Pa a0 1 ie 
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(y + 1)) 


- where equation changes 


sigma) 
creates array to store values 


(1 - tanh(sigma * 
(1 - tanh(sigma * 


a 
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0 


* 
* 


polarNRZpdf2 (y, 
"critical point" 


((sigma / 4) 


zeros (size(y)); 
((sigma / 4) 


Combine values of each region for output pdf 


Equation for z < 0 
Equation for z > 


Define 


Z= zl + 22; 


critPoint = 0; 


function z 
Zz 
2 
3 
2 
3 
zl 
2 
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Z2 
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APPENDIX L. POLAR NRZ DPDF FUNCTION: TANH MODEL 


% POLAR NRZ DPDF2 % 
% This function returns an approximate pdf derivative of a polar NRZ signal % 
% using the modified hyperbolic tangent model w/ compression factor sigma % 


% z = polarNRZdpdf2(y, sigma) % 


function z = polarNRZdpdf2(y, sigma) 


Z = zeros(size(y)); % creates array to store values 
% Define "critical point" - where equation changes 
critPoint = 0; 


2 


% Equation for z < 0 


zl = -((sigma*2) / 2).* (tanh(sigma * (y + 1)).* polarNRZpdf(y, sigma)).* (y < 
critPoint); 


% Equation for z >= 0 


z2 = -((sigma*2) / 2).* (tanh(sigma * (y - 1)).* polarNRZpdf(y, sigma)).* (y >= 
critPoint); 


% Combine values of each region for output dpdf 


z= zl + 22; 
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APPENDIX M. INFOMAX CODE FOR POLAR NRZ SIGNALS 
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modified hyperbolic tangent function 


3 polar NRZ signals 


multiple iterations of a modified gradient ascent function 
cdf/pdf 


DESCRIPTION 


© 
oO 


2 


ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
oe 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
oe 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 
ole 


randn('state',state) 


rand('state',state); 


, 
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% Set Random Number State 
state= 


format compact 
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clear 
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INPUT Section 
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All inputs are entered here % 
for ease of adapting code 
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(will need to make modifications for add'l sigs below) 
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Enter number of signals 
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x 
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% Enter number of samples 

N = 100; 

% Enter lenth of time vector in seconds 
totaltime = 5; 

% Enter sample rate 

samplerate = 100; 

% Enter maximum # of iterations for Gradient Ascent 
maxiter = 100; 

% Enter inital step size for Gradient Ascent 
eta = .05; 

% Enter increased step size for Gradient Ascent 
alpha = 1.2; 

% Enter decreased step size for Gradient Ascent 
beta = 0.1; 

% Enter # of times to repeat Gradient Ascent 
gradasiter = 100; 
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% Enter new W step for multiple Gradient Ascent iterations 
step = 0.01; 


% Enter compression factor for polar NRZ functions 


sigma = 2; 


% MUST DEFINE cdf, pdf, dpdf in GRADIENT ASCENT function % 


ole 


Create "random" signals... 


M=M; %& # of signals, this will have to be changed be 
N =N; %S # of samples 

i = 1:N; 

S$sl=(rand(1)).*polarNRZ (totaltime,10*rand(1),samplerate) ; 
S$s2=(rand(1)).*polarNRZ (totaltime,10*rand(1),samplerate) ; 
sl = (rand(1)).*polarNRZ(totaltime,10*rand(1),samplerate) ; 
s2 = (rand(1)).*polarNRZ(totaltime,10*rand(1),samplerate) ; 
s3 = (rand(1)).*polarNRZ(totaltime,10*rand(1),samplerate) ; 
s = [sl s2 s3]; %column vectors 


ole 


Create "unknown" signal mixture x 


w = rand(M); 
X= sS * w; 
mixture = x; 


% Perform GRADIENT ASCENT repetitions 


for j = 1l:gradasiter; 
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low if M is increased 


$sl=sl/std(sl); 


$s2=s2/std(s2); 


= 
Il 


eye(M); Sinitializes lst W to identity matrix 


= 
Il 


step * j * rand(M); Srandomly selects subsequent W, increases w/ j 


end 
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% Call GradientAscent function to perform gradient ascent 


[y,hs,grads,etas,W] = GradientAscent_polarNRZ2 (N,maxiter,alpha,beta,eta,x,W, sigma) ; 


% Create temporary arrays to store values from each iteration 


ytemp(j,:,:) = ys 
hstemp(j,:,:) = hs; 
gradstemp(j,:,:) = grads; 
etastemp(j,:,:) = etas; 
Wtemp(j,:,:) = W; 


end 


% Select iteration with highest ENTROPY 


[maxh, index] = max(max(hstemp')); %finds highest entropy index 


% Select y, hs, grads, etas, W from highest entropy index 


y(:,:) = ytemp(index,:,:); 

hs(:,:) = hstemp(index,:,:); 

grads(:,:) = gradstemp(index,:,:); 

etas(:,:) = etastemp(index,:,:); 

W(:,:) = Wtemp(index,:,:); 

% Break out extracted signals - using column vectors 


yl = y(:,1l)s y2 = y(:,2)% y3 = y(:,3)3 


SPlot original signals 
figure(1); 


subplot (3,M,1); plot(sl); title('\bfSignal 1'); xlabel('discrete time'); 
ylabel ('Voltage') 


subplot (3,M,2); plot(s2); title('\bfSignal 2'); xlabel('discrete time'); 
ylabel('Voltage') 
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subplot (3,M,3); plot(s3); title('\bfSignal 3'); 


ylabel ('Voltage') 


SPlot signal mixture 


subplot (3,M,4); plot (mixture(:,1)); 


time'); ylabel('Voltage"') 


subplot (3,M,5); plot (mixture(:,2)); 


time'); ylabel('Voltage') 


subplot (3,M,6); plot (mixture(:,3)); 


time'); ylabel('Voltage"') 


SPlot extracted signals 


title('\bfSignal 


title('\bfSignal 


title('\bfSignal 


subplot (3,M,7); plot(yl); title('\bfExtracted Signal 


ylabel ('Voltage') 


subplot (3,M,8); plot(y2); title('\bfExtracted Signal 


ylabel ('Voltage') 


subplot (3,M,9); plot(y3); title('\bfExtracted Signal 


ylabel ('Voltage') 


Mixture 1'); 


Mixture 2'); 


Mixture 3'); 


age 
2"); 
2'); 


xlabel ('discrete time'); 


xlabel('discrete 


xlabel('discrete 


xlabel('discrete 


xlabel ('discrete time'); 


xlabel ('discrete time'); 


xlabel ('discrete time'); 


SPlot change inh and gradient magnitude during optimization 


figure (3); 


subplot (3,1,1); plot(hs); title('\bfFunction Values - Entropy'); 


xlabel ('Iteration'); 


ylabel("h(Y)'); 


subplot (3,1,2); plot(grads); title('\bfMagnitude of Entropy Gradient'); 


xlabel ('Iteration'); 


ylabel('Gradient Magnitude'); 


subplot (3,1,3); plot(etas); title('\bfMagnitude of Eta'); 


xlabel ('Iteration'); 


% % Plot PDFs of "Random" Signals 
%& figure (2);subplot(2,1,1); hist(s1,N/100); 


% figure (2);subplot(2,1,2); hist(s2,N/100); 


w; %& Mixing Matrix 


W % Unmixing Matrix 


ylabel('Eta Magnitude'); 


(Optional) 


I = w*W % should yield Identity matrix 


% Compute optimum h 

pdfopt = polarNRZpdf (s, sigma) ; 
detWopt = abs (det (inv(w))); 
hopt = 
h = max (hs) 


index 


(1/N) *sum(sum(log(epstpdfopt) ) ) 
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+ log(detWopt) 


title('\bfPDF of Signal 1') 


title('\bfPDF of Signal 2') 


APPENDIX N. GRADIENT ASCENT FUNCTION (POLAR NRZ - 
TANH MODEL) 
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GRADIENT ASCENT for POLAR NRZ signals 


oe 


%& This function performs a Gradient Ascent algorithm that varies its "step" % 
% size as a maximum is approached - input cdf, pdf, and dpdf required! % 
& eta = initial step size, alpha = increased step size, beta = decreased step size % 
% CDF, PDF, DPDF INPUTS: modified hyperbolic tangent function for approximation % 
& [y,hs,grads,etas,W] = GradientAscent_polarNRZ2 (N,maxiter,alpha,beta,eta,x,W,sigma) % 
ee ee Ee ee Ee EEE Ee EE EEE EEE EEE LE EEE EEE EEE ELE EEE EEE EEE ERE EER ERE EE ES 
EEEESESESESESESESESESESESESESESESESESEEEEEEEEEEEESESESESESESESESESESESESEEEEEEEEEEEEES 


function[y, hs, grads, etas, W] = GradientAscent_highkurtosis(N, maxiter, alpha, beta, 
eta, x, W, sigma) 
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% Create arrays to store values of h, grad, eta, and W 


hs = zeros (maxiter,1)j; gs = zeros(maxiter,1); etas = zeros(maxiter,1); 
for iter = l:maxiter 
yux * W; 


% Input model CDF [Y=g(y) ] 

Y = polarNRZcdf2(y, sigma) ; 

% Input model PDF [cdf'=pdf=g' (y) ] 

pdf = polarNRZpdf2(y,sigma) ; 

% Input PDF derivative [dpdf=g'' (y) ] 

dpdf = polarNRZdpdf2(y,sigma) ; 

% End INPUT - more in 'else' below! % 

psi = (dpdf)./ (eps + pdf); 

detW = abs (det (W)); 

% Calculate entropy for current iteration 


h = (1 / N) * sum(sum(log(eps + pdf))) + log(detW) ; 


if iter > 1 


if h > hs(iter - 1) % (if entropy increased) 
eta = alpha * etas(iter - 1); 
else %h<hs(iter-1): (h got smaller - entropy decreased) 
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W = Wold; 


beta * etas(iter - 1); 


eta 


INPUT for ELSE loop % 


a 
© 


a 
© 


pdf as above! 


Use same cdf, 


a 
© 


Input model CDF 


a 
© 


polarNRZcdf2 (y,sigma) ; 


Sa 


Input model PDF 


a 
© 


polarNRZpdf2 (y, sigma) ; 


pdt 


Input PDF derivative 


a 
© 


polarNRZdpdf2 (y, sigma) ; 


dpdf 


ole 


End INPUT 


a 
© 


psi for Wold 


h, 


sRecalculate detW, 


abs (det (W) ); 


detW 


+ log(detw) ; 


* sum(sum(log(eps + pdf) )) 


(1 / N) 
wil, 


he = 


(eps + pdf); 


= (dpdf) 


psi 


end 


a 


% iter <= 


else 


W + eta * grad; 


We= 


eta, W 


grad, 


SRecord h, 


Wold = W; 


eta; 


=h; grads(iter) = norm(grad(:)); etas(iter) = 


hs (iter) 


end 


grads, etas 


hs, 


OUTPUTS y, 
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